To Reason or Not to: Selective Chain-of-Thought in Medical Question Answering
AI 摘要
提出选择性思维链(Selective CoT)方法,在保证准确率的同时,提高医学问答效率。
主要贡献
- 提出了Selective CoT方法,根据问题复杂度动态选择是否进行推理
- 实验证明Selective CoT在医学问答任务中能有效减少推理时间和Token消耗
- 验证了Selective CoT在不同LLM和医学问答数据集上的泛化性
方法论
Selective CoT首先预测问题是否需要推理,仅在需要时生成推理链。在Llama-3.1-8B和Qwen-2.5-7B上进行实验,评估准确率、Token使用和推理时间。
原文摘要
Objective: To improve the efficiency of medical question answering (MedQA) with large language models (LLMs) by avoiding unnecessary reasoning while maintaining accuracy. Methods: We propose Selective Chain-of-Thought (Selective CoT), an inference-time strategy that first predicts whether a question requires reasoning and generates a rationale only when needed. Two open-source LLMs (Llama-3.1-8B and Qwen-2.5-7B) were evaluated on four biomedical QA benchmarks-HeadQA, MedQA-USMLE, MedMCQA, and PubMedQA. Metrics included accuracy, total generated tokens, and inference time. Results: Selective CoT reduced inference time by 13-45% and token usage by 8-47% with minimal accuracy loss ($\leq$4\%). In some model-task pairs, it achieved both higher accuracy and greater efficiency than standard CoT. Compared with fixed-length CoT, Selective CoT reached similar or superior accuracy at substantially lower computational cost. Discussion: Selective CoT dynamically balances reasoning depth and efficiency by invoking explicit reasoning only when beneficial, reducing redundancy on recall-type questions while preserving interpretability. Conclusion: Selective CoT provides a simple, model-agnostic, and cost-effective approach for medical QA, aligning reasoning effort with question complexity to enhance real-world deployability of LLM-based clinical systems.