Agent Tuning & Optimization 相关度: 5/10

Kernel Single-Index Bandits: Estimation, Inference, and Learning

Sakshi Arya, Satarupa Bhattacharjee, Bharath K. Sriperumbudur

arXiv: 2603.18938v1 发布: 2026-03-19 更新: 2026-03-19

下载 PDF arXiv 页面

AI 摘要

研究了带单指标模型的上下文Bandit问题，提出了兼顾学习和推理的核化算法。

主要贡献

提出了核化的ε-greedy算法
建立了自适应采样下单指标估计器的渐近正态性
获得了有限时间的后悔界

方法论

结合Stein估计、逆倾向加权核岭回归，并利用逆加权Gram矩阵的集中界和鞅中心极限定理。

原文摘要

We study contextual bandits with finitely many actions in which the reward of each arm follows a single-index model with an arm-specific index parameter and an unknown nonparametric link function. We consider a regime in which arms correspond to stable decision options and covariates evolve adaptively under the bandit policy. This setting creates significant statistical challenges: the sampling distribution depends on the allocation rule, observations are dependent over time, and inverse-propensity weighting induces variance inflation. We propose a kernelized $\varepsilon$-greedy algorithm that combines Stein-based estimation of the index parameters with inverse-propensity-weighted kernel ridge regression for the reward functions. This approach enables flexible semiparametric learning while retaining interpretability. Our analysis develops new tools for inference with adaptively collected data. We establish asymptotic normality for the single-index estimator under adaptive sampling, yielding valid confidence regions, and derive a directional functional central limit theorem for the RKHS estimator, which provides asymptotically valid pointwise confidence intervals. The analysis relies on concentration bounds for inverse-weighted Gram matrices together with martingale central limit theorems. We further obtain finite-time regret guarantees, including $\tilde{O}(\sqrt{T})$ rates under common-link Lipschitz conditions, showing that semiparametric structure can be exploited without sacrificing statistical efficiency. These results provide a unified framework for simultaneous learning and inference in single-index contextual bandits.

arXiv 分类

stat.ML cs.LG math.ST

AI 摘要

主要贡献

方法论

原文摘要

标签

arXiv 分类