Extending Q-learning to general adaptive multi-agent systems

Gerald Tesauro

Publication

NeurIPS 2003

Conference paper

Extending Q-learning to general adaptive multi-agent systems

NeurIPS 2003

Abstract

Recent multi-agent extensions of Q-Learning require knowledge of other agents' payoffs and Q-functions, and assume game-theoretic play at all times by all other agents. This paper proposes a fundamentally different approach, dubbed "Hyper-Q" Learning, in which values of mixed strategies rather than base actions are learned, and in which other agents' strategies are estimated from observed actions via Bayesian inference. Hyper-Q may be effective against many different types of adaptive agents, even if they are persistently dynamic. Against certain broad categories of adaptation, it is argued that Hyper-Q may converge to exact optimal time-varying policies. In tests using Rock-Paper-Scissors, Hyper-Q learns to significantly exploit an Infinitesimal Gradient Ascent (IGA) player, as well as a Policy Hill Climber (PHC) player. Preliminary analysis of Hyper-Q against itself is also presented.

Date

08 Dec 2003

Publication

NeurIPS 2003

Authors

Gerald Tesauro

IBM-affiliated at time of publication

Abstract

Date

Publication

Authors

Share