Impact of Multimodal and Conversational AI on Learning Outcomes and Experience
AI 摘要
研究了多模态和对话式AI对视觉密集型STEM领域学习效果的影响。
主要贡献
- 比较了三种学习生物学的方法:MuDoC, TexDoC, DocSearch
- 发现MuDoC组学习效果最好,TexDoC组体验最好但效果最差
- 认知负荷理论解释了多模态和对话性对学习的影响
方法论
通过在线随机对照实验(N=124),比较不同对话式AI系统对学习生物学的影响,并分析学习效果和用户体验。
原文摘要
Multimodal Large Language Models (MLLMs) offer an opportunity to support multimedia learning through conversational systems grounded in educational content. However, while conversational AI is known to boost engagement, its impact on learning in visually-rich STEM domains remains under-explored. Moreover, there is limited understanding of how multimodality and conversationality jointly influence learning in generative AI systems. This work reports findings from a randomized controlled online study (N = 124) comparing three approaches to learning biology from textbook content: (1) a document-grounded conversational AI with interleaved text-and-image responses (MuDoC), (2) a document-grounded conversational AI with text-only responses (TexDoC), and (3) a textbook interface with semantic search and highlighting (DocSearch). Learners using MuDoC achieved the highest post-test scores and reported the most positive learning experience. Notably, while TexDoC was rated as significantly more engaging and easier to use than DocSearch, it led to the lowest post-test scores, revealing a disconnect between student perceptions and learning outcomes. Interpreted through the lens of the Cognitive Load Theory, these findings suggest that conversationality reduces extraneous load, while visual-verbal integration induced by multimodality increases germane load, leading to better learning outcomes. When conversationality is not complemented by multimodality, reduced cognitive effort may instead inflate perceived understanding without improving learning outcomes.