LLM Reasoning 相关度: 9/10

To Reason or Not to: Selective Chain-of-Thought in Medical Question Answering

Zaifu Zhan, Min Zeng, Shuang Zhou, Yiran Song, Xiaoyi Chen, Yu Hou, Yifan Wu, Yang Ruan, Rui Zhang
arXiv: 2602.20130v1 发布: 2026-02-23 更新: 2026-02-23

AI 摘要

提出选择性思维链(Selective CoT)方法,在保证准确率的同时,提高医学问答效率。

主要贡献

  • 提出了Selective CoT方法,根据问题复杂度动态选择是否进行推理
  • 实验证明Selective CoT在医学问答任务中能有效减少推理时间和Token消耗
  • 验证了Selective CoT在不同LLM和医学问答数据集上的泛化性

方法论

Selective CoT首先预测问题是否需要推理,仅在需要时生成推理链。在Llama-3.1-8B和Qwen-2.5-7B上进行实验,评估准确率、Token使用和推理时间。

原文摘要

Objective: To improve the efficiency of medical question answering (MedQA) with large language models (LLMs) by avoiding unnecessary reasoning while maintaining accuracy. Methods: We propose Selective Chain-of-Thought (Selective CoT), an inference-time strategy that first predicts whether a question requires reasoning and generates a rationale only when needed. Two open-source LLMs (Llama-3.1-8B and Qwen-2.5-7B) were evaluated on four biomedical QA benchmarks-HeadQA, MedQA-USMLE, MedMCQA, and PubMedQA. Metrics included accuracy, total generated tokens, and inference time. Results: Selective CoT reduced inference time by 13-45% and token usage by 8-47% with minimal accuracy loss ($\leq$4\%). In some model-task pairs, it achieved both higher accuracy and greater efficiency than standard CoT. Compared with fixed-length CoT, Selective CoT reached similar or superior accuracy at substantially lower computational cost. Discussion: Selective CoT dynamically balances reasoning depth and efficiency by invoking explicit reasoning only when beneficial, reducing redundancy on recall-type questions while preserving interpretability. Conclusion: Selective CoT provides a simple, model-agnostic, and cost-effective approach for medical QA, aligning reasoning effort with question complexity to enhance real-world deployability of LLM-based clinical systems.

标签

医学问答 大语言模型 思维链 效率优化

arXiv 分类

cs.CL cs.AI