LLM Memory & RAG 相关度: 8/10

What Do LLMs Associate with Your Name? A Human-Centered Black-Box Audit of Personal Data

Dimitri Staufer, Kirsten Morehouse
arXiv: 2602.17483v1 发布: 2026-02-19 更新: 2026-02-19

AI 摘要

论文审计LLM对个人数据的关联性,发现模型能生成高准确度的个人信息,并引发用户对数据隐私的关注。

主要贡献

  • 提出LMP2审计工具,评估LLM对个人信息的关联
  • 评估了多个LLM生成个人信息的准确性
  • 调查了用户对LLM生成个人信息关联的看法和隐私需求

方法论

通过用户研究(N=458)和LMP2工具,审计了多个LLM(包括GPT-4o)对个人数据的关联程度和准确性。

原文摘要

Large language models (LLMs), and conversational agents based on them, are exposed to personal data (PD) during pre-training and during user interactions. Prior work shows that PD can resurface, yet users lack insight into how strongly models associate specific information to their identity. We audit PD across eight LLMs (3 open-source; 5 API-based, including GPT-4o), introduce LMP2 (Language Model Privacy Probe), a human-centered, privacy-preserving audit tool refined through two formative studies (N=20), and run two studies with EU residents to capture (i) intuitions about LLM-generated PD (N1=155) and (ii) reactions to tool output (N2=303). We show empirically that models confidently generate multiple PD categories for well-known individuals. For everyday users, GPT-4o generates 11 features with 60% or more accuracy (e.g., gender, hair color, languages). Finally, 72% of participants sought control over model-generated associations with their name, raising questions about what counts as PD and whether data privacy rights should extend to LLMs.

标签

LLM 个人数据 隐私 审计

arXiv 分类

cs.HC cs.AI cs.CL cs.CY