Towards Robust Speech Deepfake Detection via Human-Inspired Reasoning
AI 摘要
提出HIR-SDD,结合大型音频语言模型和人类推理,提升语音深度伪造检测的鲁棒性和可解释性。
主要贡献
- 提出HIR-SDD框架
- 结合大型音频语言模型和人类推理
- 构建人类标注数据集用于链式思考推理
方法论
利用人类标注数据集,训练大型音频语言模型进行链式思考推理,提供可解释的深度伪造检测结果。
原文摘要
The modern generative audio models can be used by an adversary in an unlawful manner, specifically, to impersonate other people to gain access to private information. To mitigate this issue, speech deepfake detection (SDD) methods started to evolve. Unfortunately, current SDD methods generally suffer from the lack of generalization to new audio domains and generators. More than that, they lack interpretability, especially human-like reasoning that would naturally explain the attribution of a given audio to the bona fide or spoof class and provide human-perceptible cues. In this paper, we propose HIR-SDD, a novel SDD framework that combines the strengths of Large Audio Language Models (LALMs) with the chain-of-thought reasoning derived from the novel proposed human-annotated dataset. Experimental evaluation demonstrates both the effectiveness of the proposed method and its ability to provide reasonable justifications for predictions.