Natural actor-critic with baseline adjustment for variance reduction

Tetsuro Morimura; Eiji Uchibe; Kenji Doya

doi:10.1007/s10015-008-0514-8

Publication

Artificial Life and Robotics

Paper

Natural actor-critic with baseline adjustment for variance reduction

Artificial Life and Robotics

View publication

Abstract

In this study, we discuss a baseline function for the estimation of a natural policy gradient with respect to variance, and demonstrate a condition in which an optimal baseline function that reduces the variance is equivalent to the state value function. However, outside of this condition, the state value could be considerably different from the optimal baseline. For such cases, an extended version of the NTD algorithm is proposed, where an auxiliary function is estimated to adjust the baseline, being state value estimates in the original NTD version, to the optimal baseline. The proposed algorithm is applied to simple MDPs and a challenging pendulum swing-up problem. © International Symposium on Artificial Life and Robotics (ISAROB). 2008.

Date

14 Dec 2008

Publication

Artificial Life and Robotics

Authors

IBM-affiliated at time of publication

Topics

AI

Abstract

Date

Publication

Authors

Topics

Share