Active

Unsupervised Environment Design for Multi-task Reinforcement Learning

We are interested in designing a method to improve learning efficiency and generalization in a single-agent multi-task reinforcement learning (RL) setting by leveraging unsupervised environment design techniques.

Researchers Yuqing Du, UC Berkeley,...

Coherent and Consistent Long Story Generation

This is a continuation of our previous Year 3 collaboration, Learning-Driven Exploration For Search

Participants

Berkeley Advisor: Dan Klein, klein@berkeley.edu...

Emergent Collaboration for Heterogeneous Multi-Robot Rearrangement

Collaboration among different species is common in nature. For instance, ostriches have sharp eyesight but poor hearing and weak sense of smell, while zebras have exceptional hearing and great sense of smell but bad eyesight. They form a symbiotic relationship to protect themselves from predators on the African savanna.

Drawing inspiration from symbiotic collaboration in nature, heterogeneous multi-robots can also work...

Learning Successor Affordances as Temporal Abstractions

Successor features (SF) provide a convenient representation for value functions that can be used to obtain value functions under new reward functions by simply recombining the features via linear combination. However, successor features, by construction, require the underlying policy of the value function to be fixed. This can be undesirable whenthe goal is to find the optimal value function each different reward function as the successor features for different policy can be different.

In this project, we explore successor affordances (SA) that can provide a basis for...

Learning Dexterous In-Hand Manipulation with Vision and Touch

Overview

Consider the task of stacking LEGO bricks or assembling IKEA furniture in Figure. Given a goal image configuration, humans can rapidly figure out a plan to accurately manipulate the LEGO bricks or furniture parts to achieve the goal. This is mainly due to: 1) humans are already equipped with a good mental dynamics model through daily interaction with objects,...

Alpa: A Distributed System for Training and Serving Large Models

Alpa is a system for training and serving large-scale neural networks.

Scaling neural networks to hundreds of billions of parameters has enabled dramatic breakthroughs such as GPT-3, but training and serving these large-scale neural networks require complicated distributed system techniques. Alpa aims to automate large-scale distributed training and serving with just a few lines of code.

Code: https://github.com/alpa-projects/alpa

Fate of Snow

Northstar:“Develop iterative, meaningful benchmarks for AI researchers that enable substantial progress on problems related to climate change as well as impactful AI methodology.”

Summary: Learning from Observational, Multimodal, Multiscale, Spatiotemporal (OMMS) data sources are critical for researchers and practitioners working on problems related to climate change. AI methods for handling these types of data – and the many associated problems – remain largely undeveloped, and...

Statistically Efficient Offline RL with General Function Approximation

Abstract

Offline reinforcement learning (RL) aims at learning effective policies from only a previously-collected dataset of interactions without access to further interactions with the environment. To handle datasets with partial coverage, conservatism is recently shown to be necessary, both in practice and theory, for offline RL. Existing offline RL algorithms, however, either do not offer theoretical guarantees or are not practical due to strong assumptions (such as tabular or linear parameterization) or computational intractability. We propose...

Combating Hallucination in Conditional Sequence Generation

Overview

In recent years, large-scale pre-trained language models (e.g. BERT, BART, GPT3) have been widely adopted in various text generation applications such as machine translation, document summarization, and question answering. However, as previous works [1] analyzed, powerful language models tend to dominate the prediction of conditional generation, and the model is likely to hallucinate only based on the target history. For example in summarization tasks, a conditional generation model may ignore the source texts, and generate summarization which does not exist in...

Automated Collision Prediction in Autonomous Systems with Monocular Camera

This project aims to improve real world, wide field of view depth estimation using monocular sensors. In doing so, various geometry of indoor and outdoor sceneries will be experimented with using large deep learning models. A focus will be placed on data representation in the process in order to investigate and identify the most efficient pipelines.

Researchers Jerome Quenum, University of California - Berkeley Brent Yi, University of California - Berkeley Avideh Zakhor, University of California - Berkeley Austin Stone, Google Rico Jonschkowski,...