Unsupervised Environment Design for Multi-task Reinforcement Learning

We are interested in designing a method to improve learning efficiency and generalization in a single-agent multi-task reinforcement learning (RL) setting by leveraging unsupervised environment design techniques.



Training an agent to solve multiple tasks, each of them with many different instances, is a long-standing challenge in AI and reinforcement learning, in particular. Making progress on this problem could have a large impact on real-world applications such as household robots. For example, a general-purpose household robot would not only have to clean an apartment, but also cook dinner, or water the plants — all of these in any apartment layout, with any tools and objects around. However, current algorithms struggle to generalize (or quickly adapt) to new tasks (i.e. with a different reward function) and even generalize to new instances of the same task (i.e. with different initial states, emission functions, or dynamics, but the same reward function).

Recently, there has been significant progress in training more general and robust agents that can generalize to unseen variations of a task. One approach in particular has proven to be effective for zero-shot generalization to new task instances, namely unsupervised environment design (UED). These types of methods assume control over the environment generation process, so that the algorithm can select which environments to train the agent on at any point during learning. This can be leveraged to create a personalized curriculum (i.e. tailored to the current agent’s strengths and weaknesses) to efficiently train an agent to solve a given task (in its many variations) and even zero-shot generalize to new instances of this task. However, these techniques haven’t yet been applied to the multi-task setting where we want the agent to learn to solve multiple tasks, each of them with a large number of instances. We note that the multi-task scenario has additional challenges relative to the single-task one (where we can assume the principle of unhanged optimality i.e. that a single policy can solve all the tasks) such as interference across tasks (if we sample uniformly from the set of tasks) or catastrophic forgetting (if we cycle through the set of tasks). 

In this project, we would like to gain a deeper understanding of the use of UED / Dynamic Task Generation (DTG) methods for single-agent procedurally-generated multi-task scenarios, understand the limitations of current UED / DTG approaches in this under-studied setting, and use the acquired insights to develop better UED / DTG techniques for this problem. We are interested in improving both sample efficiency, final performance, as well as generalization to new instances of a task, and even generalization or adaptation to new tasks.