Wrapping Up Offline RL as part of AutoMLPipeline Workflow

Paulito Palmes

Publication

JuliaCon 2023

Talk

Wrapping Up Offline RL as part of AutoMLPipeline Workflow

JuliaCon 2023

View publication

Abstract

Unlike in Online RL where agents need to interact with real environment, Offline RL works similar to a typical machine learning workflow. Given a dataset, Offline RL processes data extracting state, action, reward, and terminal columns to optimize the policy Q. By wrapping up offline RL into the AutoMLPipeline workflow, it becomes trivial to search for the optimal preprocessing elements and their combinations to improve Offline RL optimal policy using symbolic workflow manipulation. As part of AutoMLPipeline workflow, it becomes trivial to search which preprocessing elements and their combinations provide the best policy Q by cross-validation where the dataset is split into training and testing several times to get the average accumulated discounted rewards (return) of a given policy Q. This talk will demonstrate how to setup the Offline RL pipeline to preprocess the dataset and learn the optimal policy Q and incorporate some parallel search strategy to get the optimal workflow.

Date

25 Jul 2023

Publication

JuliaCon 2023

Authors

Paulito Palmes

IBM-affiliated at time of publication

Abstract

Date

Publication

Authors

Topics

Share