The goal of this collaboration is to explore the limits and possibilities of sequential decision making in complex, high-dimensional environments. Compared with more classical settings such as supervised learning, relatively little is known regarding the minimal assumptions, representational conditions, and algorithmic principles needed to enable sample-efficient learning in complex control systems with rich sets of actions and observations. Given recent empirical breakthroughs in robotics and game playing ([SHM+16], [MKS+15]), we believe that it is a timely moment to develop our understanding of the theoretical foundations of reinforcement learning (RL). In doing so, we aim to identify new algorithmic techniques and theoretical insights which may serve to democratize RL into a well-founded, mature technology that can be routinely used in practice by non-experts.
We are thankful to Microsoft for providing us with Azure credits for this project.