mAceReason-Math: A Dataset of High-Quality Multilingual Math Problems Ready For RLVR
AI 摘要
提出了mAceReason-Math,一个高质量的多语言数学问题数据集,用于强化学习与可验证奖励。
主要贡献
- 构建了高质量的多语言数学问题数据集
- 该数据集专为RLVR设计,难度适合当前模型
- 覆盖14种语言,每种语言超过10,000个样本
方法论
通过翻译AceReason-Math数据集,并进行清洗和优化,构建了多语言数学问题数据集。
原文摘要
Reinforcement Learning with Verifiable Rewards (RLVR) has been successfully applied to significantly boost the capabilities of pretrained large language models, especially in the math and logic problem domains. However, current research and available training datasets remain English-centric. While mul- tilingual training data and benchmarks have been created in the past, they were not created with RLVR and current model capability in mind, and their level of difficulty is often too low to provide appropriate training signals for current models. To address this gap, we provide mAceReason-Math, a dataset of high-quality translations of challenging math problems sourced from a corpus specifically curated for RLVR (AceReason-Math). We further take specific care to clean and improve our translations, resulting in a coverage of 14 languages with more than 10,000 samples per language. We release the dataset to facilitate multilingual RLVR research and benchmarking in the research community.