Learning human-robot collaboration from human feedback

Project Proposal

Researchers

Jerry Zhi-Yang He, http://herobotics.me/

Anca D. Dragan, http://people.eecs.berkeley.edu/~anca/

Akshara Rai, https://scholar.google.com/citations?user=H8FJlJoAAAAJ&hl=en

Overview

Our goal is to develop algorithms that can power the next generation of home robot applications, where robotic agents need to understand their user's objectives through interactions and adapt to their preferences over time. Consider a human-robot team collaborating on everyday tasks like unloading groceries, preparing dinner, or cleaning the house. Such an assistive robot should coordinate with its partner to efficiently complete the task, without getting in their way. For example, while tidying the house, the robot could start cleaning the living room to maximize efficiency if its partner starts cleaning the kitchen. If the robot notices its partner loading the dishwasher, it should prioritize bringing dirty dishes from the living room to the kitchen, instead of rearranging cushions. This requires the robot to reason about not only its own embodiment (to avoid getting in the way of the human) but also about its partner’s actions and intentions to assist them efficiently.

We can formulate this as a few-shot multi-agent problem, where the goal is to maximize the joint objective, which is partially observable to the robot. Initially, the robot has uncertainty over what the human's objective is. Over time, through observing human actions and actively taking actions, the robot learns about the humans's goals and preferences and gradually unloads the task.

We propose leveraging the in-context learning capabilities of transformer models. Recently there has been tremendous progress in leveraging transformer-based models in decision-making tasks, with a focus on text prediction or single-agent decision-making problems. We apply this framework to the multi-agent collaborative setting to study whether transformers can learn about user intents through interactions. We pre-train the multi-agent decision transformer model on diverse sets of collaboration data, where the robot is assisting users of a diverse range of objectives. We then use this pre-trained model to assist new users, and leverage it's in-context learning capabilities.

Technical Objective

Our goal is to demonstrate that transformer-based decision models, when supervised on diverse collaboration data, can adapt to users in few-shot manners at test time. We will demonstrate the framework on:

1. Sokoban environment, where the two agents must coordinate to re-arrange blocks based on unknown user intent.

2. Habitat simulation environment on social rearrangement tasks

3. Real users with simulated robot environment.

References

[1] Vidhi Jain and Yixin Lin and Eric Undersander and Yonatan Bisk and Akshara Rai, Transformers are Adaptable Task Planners, 2022

[2] Andrew Szot, Unnat Jain, Dhruv Batra, Zsolt Kira, Ruta Desai, Akshara Rai, Adaptive Coordination in Social Embodied Rearrangement, 2023

[3] Dhruv Batra, Angel X Chang, Sonia Chernova, Andrew J Davison, Jia Deng, Vladlen Koltun, Sergey Levine,, Jitendra Malik, Igor Mordatch, Roozbeh Mottaghi, et al. Rearrangement: A challenge for embodied ai. 2020.

[4] Andrew Szot, Alex Clegg, Eric Undersander, Erik Wijmans, Yili Zhao, John Turner, Noah Maestre, Mustafa, Mukadam, Devendra Chaplot, Oleksandr Maksymets, et al. Habitat 2.0.

[5] Lili Chen and Kevin Lu and Aravind Rajeswaran and Kimin Lee and Aditya Grover and Michael Laskin and Pieter Abbeel and Aravind Srinivas and Igor Mordatch, Decision Transformer: Reinforcement Learning via Sequence Modeling, 2021

[6] Jerry Zhi-Yang He, Zackory Erickson, Daniel S Brown, Aditi Raghunathan, and Anca Dragan.Learning representations that enable generalization in assistive tasks. InCoRL, 2023

[7] Micah Carroll, Rohin Shah, Mark K Ho, Tom Griffiths, Sanjit Seshia, Pieter Abbeel, and Anca Dragan. On the utility of learning about humans for human-ai coordination. 2019

[8] Fei Xia, Amir R Zamir, Zhiyang He, Alexander Sax, Jitendra Malik, and Silvio Savarese. Gibson env: Real-world perception for embodied agents.CVPR, 2018.