NEX: Neuron Explore-Exploit Scoring for Label-Free Chain-of-Thought Selection and Model Ranking
AI 摘要
NEX提出了一种无监督的CoT选择和模型排序框架,通过神经元激活模式识别探索与利用阶段。
主要贡献
- 提出NEX框架,用于无监督CoT选择和模型排序
- 利用神经元激活模式识别探索与利用阶段
- 验证了NEX在推理基准和模型融合上的有效性
方法论
NEX通过稀疏激活缓存检测新激活的MLP神经元,使用HMM推断探索与利用阶段,并根据神经元的复用情况进行评分。
原文摘要
Large language models increasingly spend inference compute sampling multiple chain-of-thought traces or searching over merged checkpoints. This shifts the bottleneck from generation to selection, often without supervision on the target distribution. We show entropy-based exploration proxies follow an inverted-U with accuracy, suggesting extra exploration can become redundant and induce overthinking. We propose NEX, a white-box label-free unsupervised scoring framework that views reasoning as alternating E-phase (exploration) and X-phase (exploitation). NEX detects E-phase as spikes in newly activated MLP neurons per token from sparse activation caches, then uses a sticky two-state HMM to infer E-X phases and credits E-introduced neurons by whether they are reused in the following X span. These signals yield interpretable neuron weights and a single Good-Mass Fraction score to rank candidate responses and merged variants without task answers. Across reasoning benchmarks and Qwen3 merge families, NEX computed on a small unlabeled activation set predicts downstream accuracy and identifies better variants; we further validate the E-X signal with human annotations and provide causal evidence via "Effective-vs-Redundant" neuron transfer.