Multi-task Learning with Safe and Differentiable Policies

Generalization capability to new tasks or environments is crucial to deploy autonomous agents like robots and self-driving vehicles at scale in the real-world. This is extremely challenging and often requires the agent to perform state-specific reasoning such as in model-based planning and control. Optimization-based meta-learning methods like MAML [1] have been shown to tackle multi-task adaptation problems, but the inner-loop optimization contained in those methods makes them hard to train in an end-to-end fashion. Differentiable and end-to-end learning for planning [2] and control [3] integrate the state-specific reasoning components into the machine learning pipelines and lay the foundations to scaling up to the open generalization questions in the community. By leveraging this machinery, we propose to push the boundaries in efficient meta-learning with a particular focus towards robustness and safety guarantees [4], which are critical requirements for autonomous systems such as self-driving vehicles.



Novelty and Innovation

We will incorporate differentiable optimization approaches such as [5,6] into the meta-learning paradigm, and design a hierarchical planning and control framework where the high-level modules are designed to learn appropriate representations of the task space, while the low-level modules guarantee robustness and safety. We will design differentiable low-level modules to ensure safety and robustness via tools such as reachability analysis (e.g. sum-of-square (SOS) programming [7])). Our novelty is to bring structure into hierarchical end-to-end learning in a way that adds value over plain model-based control to give the best of both worlds.

Technical Objective

  • We will design the hierarchical differentiable meta-learning framework for planning and control problems and investigate possible representations for the environments, dynamics, skills or motion primitives [8,9].

  • In the context of autonomous driving, we will test the generalization ability with different environments and dynamics, such as navigation in different maps and interaction with varying numbers of surrounding agents with different driving styles.

  • In the context of robotic manipulation [10], we will test the generalization ability for different tasks such as grasping in clutter, interacting with articulated objects, etc.


  • [1] Finn, Chelsea, Pieter Abbeel, and Sergey Levine. "Model-agnostic meta-learning for fast adaptation of deep networks." arXiv preprint arXiv:1703.03400 (2017).

  • [2] Bhardwaj, M., Boots, B., & Mukadam, M. (2020). Differentiable Gaussian process motion planning. ICRA 2020.

  • [3] Amos, B., Jimenez, I., Sacks, J., Boots, B., & Kolter, J. Z. (2018). Differentiable MPC for end-to-end planning and control. In Advances in Neural Information Processing Systems (pp. 8289-8300).

  • [4] C. Tang, Z. Xu, and M. Tomizuka, “Disturbance-observer-based tracking controller for neural network driving policy transfer,” IEEE Transactions on Intelligent Transportation Systems, 2019.

  • [5] Grefenstette, E., Amos, B., Yarats, D., Htut, P.M., Molchanov, A., Meier, F., Kiela, D., Cho, K. and Chintala, S. (2019). Generalized inner loop meta-learning. arXiv preprint arXiv:1910.01727.

  • [6] Amos, B. (2019). Differentiable optimization-based modeling for machine learning (Doctoral dissertation, PhD thesis. Carnegie Mellon University).

  • [7] Singh S, Majumdar A, Slotine JJ, Pavone M. “Robust online motion planning via contraction theory and convex optimization,” 2017 IEEE International Conference on Robotics and Automation (ICRA) 

  • [8] Lin, Qin, et al. "MOHA: A multi-mode hybrid automaton model for learning car-following behaviors." IEEE Transactions on Intelligent Transportation Systems 20.2 (2018): 790-796.

  • [9] Wang, Wenshuo, Weiyang Zhang, and Ding Zhao. "Understanding v2v driving scenarios through traffic primitives." arXiv preprint arXiv:1807.10422 (2018).

  • [10] Eysenbach, Benjamin, et al. "Diversity is all you need: Learning skills without a reward function." arXiv preprint arXiv:1802.06070 (2018).