Agnostic Reinforcement Learning

Our goal is to understand the possibility of performing online control of an unknown dynamical system while making minimal assumptions regarding the underlying dynamics. In particular, we consider the problem of adaptively controlling the linear quadratic regulator whose states transitions lie inside a rich, nonlinear function space, such as an infinite dimensional RKHS.


Completion Report


  • Juan C Perdomo, University of California, Berkeley, link
  • Max Simchowitz, University of California, Berkeley, link
  • Alekh Agarwal, Microsoft Research, link
  • Peter Bartlett, University of California, Berkeley, link


Establishing control of an unknown dynamical system is one of the most basic problems in reinforcement learning and optimal control theory. Classical approaches to this problem have focused on control of simple systems, such as finite-dimensional linear systems or Markov Decision Processes with discrete states and actions. In this work, we seek to develop principled methods for adaptively controlling a dynamical system whose state transitions are described by some rich, nonparametric function space such as an infinite dimensional reproducing kernel Hilbert space (RKHS).

Contrary to previous settings, such as the finite-dimensional systems, dealing with infinite dimensional spaces inevitably introduces an approximation error when estimating the underlying dynamics. The main challenge of our work is developing new algorithms and analysis techniques to tackle these approximation errors. We believe that this is an important step towards developing principled control algorithms for complex, real world applications, where we must inevitably deal with model misspecification and approximation errors.

More broadly, our work builds on some of the ideas from the early days of optimal control theory developed in the 1970s. Back then, theorists proved the existence of solutions for some of these infinite-dimensional control problems. However, practical algorithms for achieving these solutions have lagged behind. Our work seeks to provide a modern treatment of these issues. We connect some of the initial solution concepts from the control theory literature with new ideas in statistical learning theory to develop rigorous and practical algorithms for control of infinite dimensional systems.