LLM Reasoning 相关度: 7/10

Bielik-Minitron-7B: Compressing Large Language Models via Structured Pruning and Knowledge Distillation for the Polish Language

Remigiusz Kinas, Paweł Kiszczak, Sergio P. Perez, Krzysztof Ociepa, Łukasz Flis, Krzysztof Wróbel, Adrian Gwoździej
arXiv: 2603.11881v1 发布: 2026-03-12 更新: 2026-03-12

AI 摘要

Bielik-Minitron-7B通过剪枝和知识蒸馏压缩Bielik-11B模型,提升波兰语性能。

主要贡献

  • 构建波兰语优化压缩模型Bielik-Minitron-7B
  • 采用结构化剪枝和知识蒸馏进行模型压缩
  • 验证了针对低资源语言压缩模型的可行性

方法论

两阶段压缩:结构化剪枝减少参数,知识蒸馏恢复性能,结合SFT、DPO-P和GRPO进行对齐。

原文摘要

This report details the creation of Bielik-Minitron-7B, a compressed 7.35B parameter version of the Bielik-11B-v3.0 model, specifically optimized for European languages. By leveraging a two-stage compression methodology inspired by the NVIDIA Minitron approach, we combined structured hybrid pruning and knowledge distillation to reduce the model's parameter count by 33.4%, from 11.04B to 7.35B. We utilized the NVIDIA Model Optimizer for structural pruning and the NVIDIA NeMo Framework for logit-based distillation for quality recovery. Following distillation, the model underwent a rigorous alignment pipeline consisting of Supervised Fine-Tuning (SFT), Direct Preference Optimization (DPO-P), and Reinforcement Learning (GRPO). Our final model successfully recovered approximately 90% of the baseline model's performance while providing up to 50% inference speedup. This approach demonstrates an efficient pathway to create language models for less-represented languages, preserving the original model quality while reducing inference deployment costs.

标签

模型压缩 知识蒸馏 结构化剪枝 波兰语 低资源语言

arXiv 分类

cs.CL cs.AI