PersonaTrace: Synthesizing Realistic Digital Footprints with LLM Agents
AI 摘要
利用LLM Agent合成真实数字足迹,解决数据稀缺问题,提升模型在真实任务上的表现。
主要贡献
- 提出 PersonaTrace 方法,生成真实数字足迹
- 合成数据集更具多样性和真实性
- 基于合成数据微调的模型在真实任务上表现更优
方法论
从结构化的用户画像出发,利用 LLM Agent 生成用户事件序列,并生成对应的数字 artifacts。
原文摘要
Digital footprints (records of individuals' interactions with digital systems) are essential for studying behavior, developing personalized applications, and training machine learning models. However, research in this area is often hindered by the scarcity of diverse and accessible data. To address this limitation, we propose a novel method for synthesizing realistic digital footprints using large language model (LLM) agents. Starting from a structured user profile, our approach generates diverse and plausible sequences of user events, ultimately producing corresponding digital artifacts such as emails, messages, calendar entries, reminders, etc. Intrinsic evaluation results demonstrate that the generated dataset is more diverse and realistic than existing baselines. Moreover, models fine-tuned on our synthetic data outperform those trained on other synthetic datasets when evaluated on real-world out-of-distribution tasks.