Safe and Sound: Learning Locomotion Skills Across Robot Morphology

Learning locomotion policies on real robots can be risky. Free robot exploration for policy learning in the real world is necessary but dangerous as it may cause catastrophes for the robots, especially for large, heavy and complicated robots. In this work, we aim at learning policies for robotic locomotion tasks for risky robots with the goal of minimizing the interaction of these risky robots with the environments. Though learning on large, heavy and complicated robots tends to be risky, learning policies on smaller, lighter and simpler robots has much lower cost and risk. We propose an approach that transfers policies learned among different robots, where the control strategies for robots with different morphologies can be shared, applied and adapted. Then we leverage robots with smaller size and simpler morphology and transfer policies we learned on easier robots to risky robots.

Update

Updated 08/24/2021

Researchers

Overview

Research Approach: We propose to learn the locomotion policy in a morphology-invariant space so that sound policies learned safely on small robots can be easily transferred to larger robots.  Such a latent space can be discovered automatically via feature disentanglement and curriculum learning, where a morphology invariant policy and a morphology-specific latent variable model are separated and can be used to adapt to larger robots through optimization.  

To quickly study the core issues on transfer learning across robot morphology and resulting in safe and sound policies, we will first work on simulated small / large robots and then transfer learning from simulated robots to real robots.  For both transfers, we encode environment and morphology-specific information in latent variable models.  We disentangle the transfer across morphology and across the sim-to-real gap to two latent variable models such that one can recompose the two models when evaluating the models on unseen real large robots.

Assumptions: we can simulate any type of robot in simulation without limitation; we have a limited number of small real robots but can collect significant data on these robots; we can collect very few data points on the real large robots to enable the transfer.

Setting and Tasks: There are several levels of transfer here: one is the transfer between different morphologies, one is the transfer between simulation and real world. We consider the following several types of morphology change: the scale and structure of the robot keeps the same but only other geometry aspects change (e.g. the leg length); the structure of the robot changes but the scale/other geometry aspects of the robot keep the same; the structure and geometry do not change but the scale of the robot changes. For simulation and real world environment, we consider the following types of environmental differences: the terrain difference, such as sand land and hard land, unstructured terrains, bumpy terrains; the surface condition difference such as roads with different friction coefficients; the road structure difference, such as road shape change in navigation tasks. The tasks include walking on unstructured terrains at different speeds towards a certain direction.

Robots and Environments:The laikago robot and the A1 robot will be our real robot platforms. In simulation, both robots have their own model, we can change the morphology of each robot including the leg length and other physical parameters. The environment’s change can be achieved by changing the ground terrains. For state observations of the robots, we will combine a depth camera input and the joint angles and scale of the robot, as well as the orientation of the robot body as the state representation.

Solution:

Step 1, achieve the transfer between different morphologies. Given the simulated robots, learn a global robot policy network N that operates in a reduced state space S_{reduced}. For each environment, use the collected data to learn the encoder and decoder that reduce the raw state into a shared state space. Use contrastive learning to enforce similar states being mapped to similar latent vectors.

Step 2, achieve the transfer from small robots in simulation to small robots in the real world. This would require the collection of enough data in the real world on small robots, and use the data to fit a latent variable that explains the environmental differences. We also learn the encoder and decoder that maps the data into the latent state space.

Step 3, achieve the transfer learning onto real large robots. We use curriculum learning here, and do not transfer directly but we first validate this transfer on medium shaped robots and then transfer on the real large robots.

Models: encoder, decoder for morphology specific state perception (each environment will have their own encoder and decoder or adapted); latent variable model + policy network in the latent state space for adapting to different environments.

Potential training approaches:

1. curriculum learning on the gradual morphology changes may enable faster adaptation across the morphology space, we may even have an agent to learn to change the morphology in an intelligent way to enable faster learning.

2. Unsupervised contrastive learning on the state space: since the states for different robots may have different dimensions/distributions, naively applying one policy will not work. For locomotion robotics, there could be some robotics priors that can find the correspondence between states, which combined with unsupervised learning (either discriminative or contrastive) may give us a well-shaped latent space that disentangles morphology and ground-truth state.

Expected Outcome:

Using the robot invariant model trained in simulation and the real-environment-specific latent model obtained from deploying on small robots, we can obtain a successful controller on a larger, riskier legged robot with few (e.g. <25) trials.

Links

References

[1] Chen, Tao, Adithyavairavan Murali, and Abhinav Gupta. "Hardware conditioned policies for multi-robot transfer learning." In Advances in Neural Information Processing Systems (NIPS), 2018.

[2] Sharma, Archit, Shixiang Gu, Sergey Levine, Vikash Kumar, and Karol Hausman. "Dynamics-aware unsupervised discovery of skills." In 2020 International Conference on Learning Representation (ICLR), 2020.

[3] Gupta, Abhishek, Coline Devin, YuXuan Liu, Pieter Abbeel, and Sergey Levine. "Learning invariant feature spaces to transfer skills with reinforcement learning." In 2017 International Conference on Learning Representation (ICLR), 2017.

[4] Yu, W., Tan, J., Bai, Y., Coumans, E. and Ha, S., 2020. Learning fast adaptation with meta strategy optimization. IEEE Robotics and Automation Letters5(2), pp.2950-2957.

[5] Yim, Justin K., Eric K. Wang, and Ronald S. Fearing. "Drift-free Roll and Pitch Estimation for High-acceleration Hopping." In IEEE International Conference on Robotics and Automation (ICRA), 2019.

[6] Nagabandi, Anusha, Guangzhao Yang, Thomas Asmar, Gregory Kahn, Sergey Levine, and Ronald S. Fearing. "Neural network dynamics models for control of under-actuated legged millirobots." In IEEE International Conference on Intelligent Robots and Systems (IROS), 2018

[7] Pan, Xinlei, Tingnan Zhang, Brian Ichter, Aleksandra Faust, Jie Tan, and Sehoon Ha. "Zero-shot Imitation Learning from Demonstrations for Legged Robot Visual Navigation." In 2020 IEEE International Conference on Robotics and Automation (ICRA), 2020.

[8] Devin, Coline, Abhishek Gupta, Trevor Darrell, Pieter Abbeel, and Sergey Levine. "Learning modular neural network policies for multi-task and multi-robot transfer." In 2017 IEEE International Conference on Robotics and Automation (ICRA), 2017.

[9] Ziwei Liu, Zhongqi Miao, Xingang Pan, Xiaohang Zhan, Dahua Lin, Stella X. Yu, Boqing Gong, Open Compound Domain Adaptation, In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2020.

[10] Peng, Xue Bin, Erwin Coumans, Tingnan Zhang, Tsang-Wei Lee, Jie Tan, and Sergey Levine. "Learning Agile Robotic Locomotion Skills by Imitating Animals." arXiv preprint arXiv:2004.00784 (2020).