LLM Reasoning 相关度: 6/10

A unified theory of feature learning in RNNs and DNNs

Jan P. Bauer, Kirsten Fischer, Moritz Helias, Agostina Palmigiano

arXiv: 2602.15593v1 发布: 2026-02-17 更新: 2026-02-17

下载 PDF arXiv 页面

AI 摘要

统一RNN和DNN的特征学习理论，揭示权重共享对网络功能的影响。

主要贡献

建立了RNN和DNN的统一平均场理论
揭示了权重共享对时序任务泛化的影响
识别了RNN和DNN行为差异的相变

方法论

通过表征核将训练视为贝叶斯推断，分析权重共享对网络功能的影响。

原文摘要

Recurrent and deep neural networks (RNNs/DNNs) are cornerstone architectures in machine learning. Remarkably, RNNs differ from DNNs only by weight sharing, as can be shown through unrolling in time. How does this structural similarity fit with the distinct functional properties these networks exhibit? To address this question, we here develop a unified mean-field theory for RNNs and DNNs in terms of representational kernels, describing fully trained networks in the feature learning ($μ$P) regime. This theory casts training as Bayesian inference over sequences and patterns, directly revealing the functional implications induced by the RNNs' weight sharing. In DNN-typical tasks, we identify a phase transition when the learning signal overcomes the noise due to randomness in the weights: below this threshold, RNNs and DNNs behave identically; above it, only RNNs develop correlated representations across timesteps. For sequential tasks, the RNNs' weight sharing furthermore induces an inductive bias that aids generalization by interpolating unsupervised time steps. Overall, our theory offers a way to connect architectural structure to functional biases.

arXiv 分类

cs.LG cond-mat.dis-nn

AI 摘要

主要贡献

方法论

原文摘要

标签

arXiv 分类