Neural network (NN) controllers trained through sampling of state-action trajectories, such as in reinforcement learning (RL), are becoming increasingly popular in the robotics community. For robotic tasks where safety is required, these methods can be problematic in two ways: first, they do not generally provide safety guarantees during training and, second, the optimization objective is inadequate for safety-critical tasks. In this project we are interested in building new approaches to safe robotic learning by leveraging well-understood tools from reachability theory and combining them with learning-based approaches.

### Update

### Researchers

- Vicenc Rubies-Royo, UC Berkeley, link
- Jaime Fisac, Princeton University, link
- Roberto Calandra, FAIR, link
- Claire Tomlin, UC Berkeley, link

### Overview

**Quick Intro to Reachability Theory: **Reachability theory is a branch of (optimal) control theory which is often used in the assessment of safety and liveness of dynamical systems. That is, given a dynamical system and a set of goal and/or failure states, one would like to (a) obtain the set of initial states of our system which are guaranteed to be safe and reach the goal, and (b) the associated controller. In RL, tasks are specified via a reward/cost function **R**, along with the familiar functional of the sum of discounted costs. In reachability theory, however, tasks are specified with two *implicit surface functions* **l**and **g**, which encode the goal states and failure states respectively. The functional in this case is defined as follows:

,

where the greek letter ξ denotes the trajectory of the system starting at state **x** under the control signal **u**. Unlike the sum of costs/rewards in RL, the minimum over time in the cost functional ensures that violations of safety constraints * anywhere* along the trajectory are penalized. This makes safety problems more modular in the sense that goals and constraints are defined separately rather than being lumped together into a single cost. This formulation also endows the associated value function with a safety interpretation: states with positive values are deemed unsafe, whereas states with negative values are deemed safe [1].

**Safe Robotic Learning: **Reachability theory provides a convenient and interpretable framework for encoding and obtaining safe behaviors in robotics. Recent results have been successful in transferring concepts from RL to solve high-dimensional reachability problems [3]. Unfortunately, these learning-based frameworks fall short in several ways. First, current approaches do not consider high-dimensional observations such as images. Second, in order for the agent to learn about safety it will often violate constraints at training time, which implies that the training process won’t be safe overall. To address these issues we are investigating using end-to-end approaches to learn the safe policy and value function, while also employing tools from neural network verification [2] which allow us to predict future sets of behaviors and reduce the exploration space at training time in order to prevent dangerous situations for the agent.

### References

- [1] Bansal, Somil, et al. "Hamilton-Jacobi reachability: A brief overview and recent advances." 2017 IEEE 56th Annual Conference on Decision and Control (CDC). IEEE, 2017.
- [2] Rubies-Royo, V., Calandra, R., Stipanovic, D. M., & Tomlin, C. (2019). Fast neural network verification via shadow prices. arXiv preprint arXiv:1902.07247.
- [3] J. F. Fisac, N. F. Lugovoy, V. Rubies-Royo, S. Ghosh and C. J. Tomlin, "Bridging Hamilton-Jacobi Safety Analysis and Reinforcement Learning," 2019 International Conference on Robotics and Automation (ICRA), Montreal, QC, Canada, 2019, pp. 8550-8556, doi: 10.1109/ICRA.2019.8794107