AI Agents 相关度: 7/10

Do Large Language Models Adapt to Language Variation across Socioeconomic Status?

Elisa Bassignana, Mike Zhang, Dirk Hovy, Amanda Cercas Curry
arXiv: 2602.11939v1 发布: 2026-02-12 更新: 2026-02-12

AI 摘要

LLM在不同社会经济地位人群的语言风格适应性方面表现不佳,易放大语言等级。

主要贡献

  • 揭示LLM在社会经济地位语言适应方面的局限性
  • 构建了按社会经济地位分层的Reddit和YouTube新数据集
  • 提出了94个社会语言学指标评估LLM的风格调整能力

方法论

从Reddit和YouTube收集按社会经济地位分层的数据集,使用94个社会语言学指标评估LLM生成文本与原始文本的差异。

原文摘要

Humans adjust their linguistic style to the audience they are addressing. However, the extent to which LLMs adapt to different social contexts is largely unknown. As these models increasingly mediate human-to-human communication, their failure to adapt to diverse styles can perpetuate stereotypes and marginalize communities whose linguistic norms are less closely mirrored by the models, thereby reinforcing social stratification. We study the extent to which LLMs integrate into social media communication across different socioeconomic status (SES) communities. We collect a novel dataset from Reddit and YouTube, stratified by SES. We prompt four LLMs with incomplete text from that corpus and compare the LLM-generated completions to the originals along 94 sociolinguistic metrics, including syntactic, rhetorical, and lexical features. LLMs modulate their style with respect to SES to only a minor extent, often resulting in approximation or caricature, and tend to emulate the style of upper SES more effectively. Our findings (1) show how LLMs risk amplifying linguistic hierarchies and (2) call into question their validity for agent-based social simulation, survey experiments, and any research relying on language style as a social signal.

标签

LLM 社会经济地位 语言风格 社会语言学

arXiv 分类

cs.CL