Flickering Multi-Armed Bandits
AI 摘要
提出了一种新的多臂老虎机框架,臂的可用性随时间变化,并分析了其探索代价。
主要贡献
- 提出了Flickering Multi-Armed Bandits (FMAB) 框架
- 分析了在Erdős--Rényi和Edge-Markovian两种图模型下的问题
- 提出了两阶段算法,并建立了次线性遗憾界
- 证明了算法的探索代价接近最优
方法论
提出了一个两阶段算法,包括lazy random walk探索和导航承诺利用阶段,并进行理论分析和实验验证。
原文摘要
We introduce Flickering Multi-Armed Bandits (FMAB), a new MAB framework where the set of available arms (or actions) can change at each round, and the available set at any time may depend on the agent's previously selected arm. We model this constrained, evolving availability using random graph processes, where arms are nodes and the agent's movement is restricted to its local neighborhood. We analyze this problem under two random graph models: an i.i.d. Erdős--Rényi (ER) process and an Edge-Markovian process. We propose and analyze a two-phase algorithm that employs a lazy random walk for exploration to efficiently identify the optimal arm, followed by a navigation and commitment phase for exploitation. We establish high-probability and expected sublinear regret bounds for both graph settings. We show that the exploration cost of our algorithm is near-optimal by establishing a matching information-theoretic lower bound for this problem class, highlighting the fundamental cost of exploration under local-move constraints. We complement our theoretical guarantees with numerical simulations, including a scenario of a robotic ground vehicle scouting a disaster-affected region.