Hierarchical Model-Based Reinforcement Learning with Temporal Abstractions

For long-horizon tasks in the real world, humans accumulate information over time and make inference using stored memory.  An abstract transition model of the world at various time resolutions allows reasoning both globally and locally.  Model-based reinforcement learning (MBRL) is efficient for learning policies for robotics control tasks.  However, current MBRL approaches only predict future trajectories in a limited time span.  A major reason is that the model only learns at one time resolution, typically at the environment transition frequency.  While such a model is useful for local planning, it is consumed by too many details and hard to make inference over a long horizon. 

We explore hierarchical models that are able to make long-horizon plans. One important task is to find sub-goals that can serve as breakpoints, with abstract models learned at various temporal resolutions. Higher layer models can build on top of lower layer models based on these sub-goals. Traditional ways of manually specifying sub-goals limit the applicability. Automatic data-driven approaches are much more desirable.  Recent progress in attention models such as transformers can learn attention weights on individual states within a sequence of trajectories. We explore the combination of attention models and hierarchical model learning. 


UCB Student:  Xinlei Pan (xinleipan@berkeley.edu)

UCB Faculty: Ronald Fearing, Stella Yu (ronf, stellayu@berkeley.edu)

FAIR Researcher: Roberto Calandra (rcalandra@fb.com)


Research Purposes1) Evaluate whether hierarchical MBRL is indeed more sample efficient than regular MBRL. In particular, we evaluate hierarchical MBRL with robot locomotion and navigation tasks on uneven terrain since these tasks can naturally benefit from hierarchical models.  2) Explore approaches to automatically find sub-goals with data driven approaches. Traditionally the task structure or sub-goals are specified by domain experts which can not be easily generalized. 

Research Approach.For trajectory planning, it is possible to build a more abstracted model with different temporal resolutions. Our model combines goal driven learning as exemplified in [5] with MBRL. In particular, we learn a higher hierarchy model that is more abstracted, with the goal of finding important sub-goals. Then based on the sub-goals we identified from the higher hierarchy model, we use goal-driven model-based RL with lower level abstractions to achieve that goal. The planning is going from a coarse temporal resolution to detailed one-step plan. This reduces the compounding error that is typically seen in the one-step prediction setting. The low-level hierarchy models can be learned as motion primitives that can be reused. We will explore attention models that are typically used in transformers to find possible sub-goals to dissect the task. In this project, we will work with legged robot navigation tasks on uneven and complicated terrains. The study will be done in simulation. There are simulation environments available such as the Gibson indoor navigation environment, or environments built with Pybullet.  The high level model can be the global trajectory planning model while the low level model can be the leg and body motion model, and the two levels can be made independent of each other.

Related Works: MBRL has demonstrated significantly better data efficiency than model free algorithms recently [1]. However, the model is still learned at the environment’s temporal frequency, making it difficult to make long term plans. [2] explored memory-based models that are able to make long term plans. However, it does not try to build hierarchical dynamics models at different abstraction levels.  [3] achieved long horizon planning through latent variable models with various regularizations and backward recurrent neural networks. [4] proposed to use trajectory n-step prediction to replace the one-step prediction approach, which is the default prediction horizon for MBRL. [5] proposed a causal info-GAN model that learns the abstracted model of the environment, but the application is limited to goal-driven environments.


[1] Chua et al. "Deep reinforcement learning in a handful of trials using probabilistic dynamics models." NIPS 2018.

[2] Fange et al. "Scene memory transformer for embodied agents in long-horizon tasks." CVPR 2019.

[3] Ke et al. "Learning dynamics model in reinforcement learning by incorporating the long term future." ICLR 2019.

[4] Lambert et al. "Learning Accurate Long-term Dynamics for Model-based Reinforcement Learning." arXiv preprint arXiv:2012.09156

[5] Kurutach et al. "Learning plannable representations with causal infoGAN." NIPS 2018.

[6] Xia et al. "Interactive Gibson benchmark: A benchmark for interactive navigation in cluttered environments." RA-L 2020.