Learning the Value Systems of Societies with Preference-based Multi-objective Reinforcement Learning
AI 摘要
提出基于偏好的多目标强化学习方法,用于学习社会群体的价值体系。
主要贡献
- 提出了学习价值对齐模型和社会价值体系的算法
- 结合聚类和基于偏好的多目标强化学习
- 学习不同用户群体的价值系统和行为策略
方法论
基于聚类和偏好驱动的多目标强化学习,学习价值对齐模型和代表用户群体的价值系统。
原文摘要
Value-aware AI should recognise human values and adapt to the value systems (value-based preferences) of different users. This requires operationalization of values, which can be prone to misspecification. The social nature of values demands their representation to adhere to multiple users while value systems are diverse, yet exhibit patterns among groups. In sequential decision making, efforts have been made towards personalization for different goals or values from demonstrations of diverse agents. However, these approaches demand manually designed features or lack value-based interpretability and/or adaptability to diverse user preferences. We propose algorithms for learning models of value alignment and value systems for a society of agents in Markov Decision Processes (MDPs), based on clustering and preference-based multi-objective reinforcement learning (PbMORL). We jointly learn socially-derived value alignment models (groundings) and a set of value systems that concisely represent different groups of users (clusters) in a society. Each cluster consists of a value system representing the value-based preferences of its members and an approximately Pareto-optimal policy that reflects behaviours aligned with this value system. We evaluate our method against a state-of-the-art PbMORL algorithm and baselines on two MDPs with human values.