LLM Reasoning 相关度: 8/10

InCoder-32B: Code Foundation Model for Industrial Scenarios

Jian Yang, Wei Zhang, Jiajun Wu, Junhang Cheng, Shawn Guo, Haowen Wang, Weicheng Gu, Yaxin Du, Joseph Li, Fanglin Xu, Yizhi Li, Lin Jing, Yuanbo Wang, Yuhan Gao, Ruihao Gong, Chuan Hao, Ran Tao, Aishan Liu, Tuney Zheng, Ganqu Cui, Zhoujun Li, Mingjie Tang, Chenghua Lin, Wayne Xin Zhao, Xianglong Liu, Ming Zhou, Bryan Dai, Weifeng Lv

arXiv: 2603.16790v1 发布: 2026-03-17 更新: 2026-03-17

下载 PDF arXiv 页面

AI 摘要

InCoder-32B是首个面向工业场景的32B参数代码大模型，在工业领域基准测试中表现出色。

主要贡献

提出InCoder-32B模型，解决工业场景代码大模型性能退化问题
采用高效架构和多阶段训练策略
构建工业领域代码基准测试

方法论

采用通用代码预训练、工业代码退火、上下文扩展、执行验证等多阶段训练策略，提升模型在工业场景的性能。

原文摘要

Recent code large language models have achieved remarkable progress on general programming tasks. Nevertheless, their performance degrades significantly in industrial scenarios that require reasoning about hardware semantics, specialized language constructs, and strict resource constraints. To address these challenges, we introduce InCoder-32B (Industrial-Coder-32B), the first 32B-parameter code foundation model unifying code intelligence across chip design, GPU kernel optimization, embedded systems, compiler optimization, and 3D modeling. By adopting an efficient architecture, we train InCoder-32B from scratch with general code pre-training, curated industrial code annealing, mid-training that progressively extends context from 8K to 128K tokens with synthetic industrial reasoning data, and post-training with execution-grounded verification. We conduct extensive evaluation on 14 mainstream general code benchmarks and 9 industrial benchmarks spanning 4 specialized domains. Results show InCoder-32B achieves highly competitive performance on general tasks while establishing strong open-source baselines across industrial domains.

arXiv 分类

cs.SE cs.AI

AI 摘要

主要贡献

方法论

原文摘要

标签

arXiv 分类