Automated State and Action Space Design for Multi-Objective Reinforcement Learning


While Reinforcement Learning (RL) has shown impressive performance in games (e.g., Go and chess [14,13], DoTA2 [2], Starcraft II [18,17], etc.), it remains a challenging problem to use RL in real-world scenarios due to the gigantic sample complexity and limited ability to generalize to unknown environments. To train
an RL agent with traditional online methods, millions of interactions with the environments are often needed, which is definitely...

Self-supervised Open-World Segmentation


Standard benchmarks in image segmentation assume a "closed-world" setting, in which a pre-determined set of non-overlapping object categories is exhaustively segmented and labeled in all training and evaluation images. This significantly increases the difficulty of data collection, requiring either complex quality control and post-processing schemes if using crowd-sourced labeling or...

Multiscale Modeling for Control


A long-standing goal of AI research is to build/learn representations that lead to generalization in real-world sequential decision settings. Object-level representation that enables abstract reasoning in visual environments is a good candidate for this goal. Indeed, RL agents equipped with this inductive bias are able to generalize to tasks that involve a different...

Towards a Unified Understanding of Privacy and Generalization for Better Algorithm Design


Machine learning and deep learning have emerged as important technologies, which enable a wide range of applications including computer vision, natural language processing, healthcare, and recommendation. However, in order to responsibly deploy these machine learning algorithms in society, it is critical to design them to conform to ethical values such as privacy, safety, fairness, etc. For instance, researchers have found that information about training data can be extracted from a released machine learning model which raises important privacy concerns, and adversarial attacks or...

GuBERT: Grounded units for Self-Supervised Pre-training of Speech

Self-Supervised Learning (SSL) techniques have proved to be quite effective for representation learning in multiple modalities like text, image, and more recently speech. In the speech domain, SSL approaches for pretraining have resulted in state-of-the art demonstrations in several downstream applications like speech recognition (WAV2VEC, WAV2VEC2.0), spoken language modeling (GSLM), speech resynthesis (HuBERT) etc. As such this approach requires massive amounts of speech data (thousands of hours of speech) and computational resources to train such large models. Also, while...

Unsupervised Environment Design for Multi-task Reinforcement Learning

We are interested in designing a method to improve learning efficiency and generalization in a single-agent multi-task reinforcement learning (RL) setting by leveraging unsupervised environment design techniques.

Researchers Yuqing Du, UC Berkeley,...

Coherent and Consistent Long Story Generation

This is a continuation of our previous Year 3 collaboration, Learning-Driven Exploration For Search


Berkeley Advisor: Dan Klein,

Emergent Collaboration for Heterogeneous Multi-Robot Rearrangement

Collaboration among different species is common in nature. For instance, ostriches have sharp eyesight but poor hearing and weak sense of smell, while zebras have exceptional hearing and great sense of smell but bad eyesight. They form a symbiotic relationship to protect themselves from predators on the African savanna.

Drawing inspiration from symbiotic collaboration in nature, heterogeneous multi-robots can also work...

Learning Successor Affordances as Temporal Abstractions

Successor features (SF) provide a convenient representation for value functions that can be used to obtain value functions under new reward functions by simply recombining the features via linear combination. However, successor features, by construction, require the underlying policy of the value function to be fixed. This can be undesirable whenthe goal is to find the optimal value function each different reward function as the successor features for different policy can be different.

In this project, we explore successor affordances (SA) that can provide a basis for...

Learning Dexterous In-Hand Manipulation with Vision and Touch


Consider the task of stacking LEGO bricks or assembling IKEA furniture in Figure. Given a goal image configuration, humans can rapidly figure out a plan to accurately manipulate the LEGO bricks or furniture parts to achieve the goal. This is mainly due to: 1) humans are already equipped with a good mental dynamics model through daily interaction with objects,...