Completed

Echo State Transformers

We demonstrate that transformers obtain impressive performance even when some of the layers are randomly initialized and never updated. Inspired by old and well-established ideas in machine learning, we explore a variety of non-linear “reservoir” layers interspersed with regular transformer layers, and show improvements in wall-clock compute time until convergence, as well as overall performance, on various machine translation.

Updates

Closing...

Distributed Probabilistic Inference on Ray

To enable the use of probabilistic models at Amazon at scale, we propose to exploit synergies between the Clay probabilistic language and the distributed computation effort called Ray that is currently being carried out at UC Berkeley RISELab.

Update

...

Learning Robust Robot Policies to Manipulate and Fill Deformable Bags

Rearranging and manipulating deformable objects such as cables, fabrics, and bags is a long-standing challenge in robotic manipulation. The complex dynamics and high-dimensional configuration spaces of deformables, compared to rigid objects, make manipulation difficult not only for multi-step planning, but even for goal specification. Goals cannot be as easily specified as rigid object poses, and...

Enabling Broader Learning Through Simplified and Personalized Summaries

Text simplification consists of re-writing a text with an objective of making it accessible to a larger audience of readers, for instance by reducing the length and complexity of sentences, and reducing the use of rare words.

Unlike summarization where progress has been rapid with the emergence of large datasets in many textual...

Neural Program Synthesis from Diverse and Distant Context

Creating effective visualization is an important part of data analytics. While there exist many libraries for creating visualization, writing such code remains difficult given the myriad of parameters that users need to provide. In this project, we propose the new task of synthesizing visualization programs from a combination of...

Graph Data Augmentation for Computer Systems

Graphs are the most common state representation for structured input problems including molecule property prediction, code representation learning and computer systems. Learning algorithms embed graph structures using graph neural networks (GNNs). However, many domains lack large training datasets due to the expense of acquiring samples; work by Mirhoseini et al. trained chip placement policies from a dataset of only 20 examples due to the complexity of designing new chips. In data-scarce settings, augmentation is widely used to improve generalization. Simple transformations like...

Disentangling Input Signals for Robust Computer Vision

High-dimensional real world imagery presents an embarrassment of riches to powerful, overparameterized neural networks; it is possible to train image classification models to surprising levels of accuracy on high-frequency or low-frequency features alone. The dominant paradigm of training and evaluating deep neural networks on independent and identically distributed (IID) data splits has obscured a significant weakness of current models, namely a lack of robustness to distribution shifts. One class of explanations posits that powerful models can learn spurious correlations, including...

Visual Locomotion: Synergistic Approach via Perception and Proprioception

Figure Description: We demonstrate the performance of our new algorithm RMA in several challenging environments. The robot is successfully able to walk on sand, mud, hiking trails, tall grass, and dirt pile without a single failure in all our trials. The robot was successful in 70% of the trials when walking down stairs along a hiking trail, and succeeded in 80% of the trials when walking across...

Long-Horizon Decision-Making with Energy-Based Models

We introduce the γ-model, a predictive model of environment dynamics with an infinite probabilistic horizon. Replacing standard single-step models with γ-models leads to generalizations of the procedures that form the foundation of model-based control, including the model rollout and model-based value estimation.

The γ-model,...

Adversarial Attacks Against Deep Reinforcement Learning Policies for Strategy Games

Program synthesis from input-output (IO) examples has been a long-standing challenge. While recent works demonstrated limited success on domain-specific languages (DSL), it remains highly challenging to apply them to real-world programming languages, such as C. Due to complicated syntax and token variation, there are three major challenges: (1) unlike many DSLs, programs in languages like C need to compile first and are not executed via interpreters; (2) the program search space grows exponentially when the syntax and semantics of the programming language become more complex; and (3)...