Publication
WWW 2023
Conference paper

Robust Stochastic Multi-Armed Bandits with Historical Data

View publication

Abstract

We consider the problem of Stochastic Contextual Multi-Armed Bandits (CMABs) initialised with Historical data. Initialisation with historical data is an example of data-driven regularisation which should, in theory, accelerate the convergence of CMABs. However, in practice, we have little to no control over the underlying generation process of such data, which may exhibit some pathologies, possibly impeding the convergence and the stability of the algorithm. In this paper, we focus on two main challenges: bias selection and data corruption. We propose two new algorithms to solve these specific issues: LinUCB with historical data and offline balancing (OB-HLinUCB) and Robust LinUCB with corrupted historical data (R-HLinUCB). We derive their theoretical regret bounds and discuss their computational performance using real-world datasets.

Date

Publication

WWW 2023

Authors

Share