Bandit Learning in Matching Markets with Interviews
AI 摘要
研究了带面试的双边匹配市场中的bandit学习,提出了战略延迟和新算法。
主要贡献
- 提出了带面试的双边匹配市场bandit学习框架
- 允许公司方的不确定性,引入战略延迟动作
- 设计了中心化和去中心化两种场景下的算法,实现时间无关的regret
方法论
通过bandit学习模拟面试过程,扩展公司方的动作空间,设计算法最小化regret。
原文摘要
Two-sided matching markets rely on preferences from both sides, yet it is often impractical to evaluate preferences. Participants, therefore, conduct a limited number of interviews, which provide early, noisy impressions and shape final decisions. We study bandit learning in matching markets with interviews, modeling interviews as \textit{low-cost hints} that reveal partial preference information to both sides. Our framework departs from existing work by allowing firm-side uncertainty: firms, like agents, may be unsure of their own preferences and can make early hiring mistakes by hiring less preferred agents. To handle this, we extend the firm's action space to allow \emph{strategic deferral} (choosing not to hire in a round), enabling recovery from suboptimal hires and supporting decentralized learning without coordination. We design novel algorithms for (i) a centralized setting with an omniscient interview allocator and (ii) decentralized settings with two types of firm-side feedback. Across all settings, our algorithms achieve time-independent regret, a substantial improvement over the $O(\log T)$ regret bounds known for learning stable matchings without interviews. Also, under mild structured markets, decentralized performance matches the centralized counterpart up to polynomial factors in the number of agents and firms.