Publication
MRS Spring Meeting 2022
Talk

Neuro-Symbolic Reinforcement Learning for Polymer Discovery

Abstract

We present the first application of neuro-symbolic reinforcement learning (RL) in materials discovery domain. Deep reinforcement learning requires excessively large volume of training data, and the learned policies lack explainability. As a result, practical application of deep RL in material discovery is problematic. We explore neuro-symbolic approaches to deep learning that combine the strengths of data-driven AI with the capabilities of human-like symbolic knowledge and reasoning [1]. This results in AI that can learn with less resources (compute, data, etc.) and results in an explainable model. Neuro-symbolic approaches are anticipated to enable co-creation of models/policy with subject matter experts (SMEs) by capturing new domain knowledge in symbolic form – this feature is particularly important in learning safety constraint that AI should follow. We investigate Logical Neural Networks (LNNs) where each neuron has an explicit meaning as a part of a formula in a weighted real-valued logic. In addition, the model is differentiable, and learning helps in learning new rules and make the network resilient against contradicting facts. In the presented study we use Logical Optimal Actions (LOA) [3, 4], a neuro-symbolic RL framework based on LNN, to train RL agents to select experimental conditions for the synthesis of spin-on-glass (SOG) given target values of experimental outcomes. The SOG is based on tetraethyl orthosilicate as the precursor and co-precursors such as phenyltriethoxysilane. Experimental degrees of freedom include temperature, reaction time, precursor/co-precursor ratio, total co-/precursor concentration, water/precursor ratio, and catalyst/precursor ratio. We explicitly pursue training of generalizable agents that learn to navigate abstract space of experiments relevant to SOG synthesis to find reaction conditions that yield materials with desired properties. We introduce a data-augmentation strategy to meet data requirements of reinforcement learning while maintaining affordable volume of experimental data – under 300 experimental data points. Neuro-symbolic RL experiments show that the LOA in combination with logical action-aware features noticeably improves agent's performance in the search for the experiments targeting specific molecular weight and polydispersity index of the produced SOG. Furthermore, the agent learns to avoid experimental conditions that produce undesirable outcomes: for example, the agent avoids conditions leading to gelation of the reaction mixture. Finally, we validate and benchmark the proposed neuro-symbolic RL approach by running spin-on-glass synthesis in the lab following AI agent predictions. Polymer discovery in its core is a sequential search for complex substances comprised of macromolecules that exhibit predefined unique and outstanding - physical and chemical properties matching demands of various industrial applications. Practical application of deep reinforcement learning is limited and problematic for polymer discovery as it requires excessively large volume of training data, and the learned policies lack explainability. This motivates us to explore neuro-symbolic approaches to deep learning that combine the strengths of data-driven AI with the capabilities of human-like symbolic knowledge and reasoning. In this talk, I'll discuss our effort to use Logical Optimal Actions (LOA), a neuro-symbolic RL framework based on LNN, to train RL agents to select experimental conditions for the synthesis of spin-on-glass (SOG) given target values of experimental outcomes. We introduce a data-augmentation strategy to meet data requirements of reinforcement learning while maintaining affordable volume of experimental data – under 300 experimental data points. Experiments show that the LOA in combination with logical action-aware features noticeably improves agent's performance in the search for the experiments targeting specific polydispersity index of the produced SOG. Furthermore, the agent learns to avoid experimental conditions that produce undesirable outcomes: for example, the agent avoids conditions leading to gelation of the reaction mixture.