Active

Towards Learning and Auditing Private Foundation Models

Abstract

Foundation models (e.g., DALL-E, GPT-3, CLIP, MAE) – pre-trained on vast amounts of diverse data through self-supervised learning – have emerged as an important building block for artificial intelligence (AI) systems [BHA+2021]. These models can be simply adapted to various downstream applications (e.g., language, vision, robotics) via fine-tuning, prompting, linear probing, etc. Despite foundation models having been extensively deployed, there is a significant lack of understanding regarding the privacy risks associated with training...

Adaptive Long-Distance Navigation for Autonomous Drones

Abstract

This project leverages a Deep Reinforcement Learning (DRL) approach to enable a large drone to navigate toward goal positions in unknown outdoor settings while avoiding obstacles. Utilizing state information and depth imagery, our method uniquely integrates pre-computed optimal trajectories—determined during privileged learning phases—as a supervisory signal with the exploratory benefits of an RL agent.

...

Grounded and Structured Self-Supervised Pre-training of Speech for Spoken Language Model

Self-supervised learning (SSL) techniques have been successful in learning rich representations for high-dimensional natural data. In the speech domain, SSL approaches for pretraining have resulted in state-of-the-art demonstrations in several downstream applications including automatic speech recognition, spoken language modeling, speech resynthesis, etc. SSL approaches employ self-driven targets as a reference to train the models, which allows the use of large-scale data without labels. However, current speech SSL methods often suffer from the arbitrariness...

Experimenting with Zero-Knowledge Proofs of Training

Abstract

How can a model owner prove they trained their model according to the correct specification? More importantly, how can they do so while preserving the privacy of the underlying dataset and the final model? We study this problem and formulate the notion of zero-knowledge proof of training (zkPoT), which formalizes rigorous security guarantees that should...

Modeling latent variable for self-supervised learning

Abstract: Masked Image Modeling (MIM) is a promising self-supervised learning approach that enables learning from unlabeled images. Despite its recent success, learning good representations through MIM remains challenging because it requires predicting the right semantic content in accurate locations. For example, given an incomplete picture of a dog, we can guess that there is a tail, but we cannot determine its exact location (a). To address this, we follow LeCun et. al (b), that suggested the use of a latent variable to capture uncertainties.

...

Designing Pro-social Networks through Multi-Agent Learning Theory

Artificial intelligence (AI) and algorithmic decision-making play a large and increasing role in how we conduct economic activity and organize our societies. We use such algorithms to allocate public funds, discover and share information, hire employees, approve mortgages, and a variety of other important tasks. However, despite the fact that these algorithms and the people that use them usually don't act in isolation, most work in AI does little to examine the precise mechanics of how people and algorithms interact, adapt and mutually influence each other through economic, physical,...

Data Curation for Web-Scale Datasets

Abstract

Data curation is a promising direction for improving the efficiency and performance of large-scale models. Current efforts towards curation are ad-hoc and disconnected. We propose to develop new principled approaches for data curation inspired by Sorscher et al...

Dynamic Compression Techniques for Efficient Transformers

Abstract

Transformers are a class of deep neural networks that have achieved state-of-the-art results across a wide range of domains, including natural language processing, computer vision, and computational biology. The widespread success of these models has been attributed to the attention mechanism, which identifies complex dependencies between elements of each input sequence. While the attention mechanism is incredibly...

Learning Large Touch-Vision-Language Models Using Self-Supervised Robot Learning

Abstract

Humans depend on the integration of multiple sensory inputs, including but not limited to vision, language, audio, and tactile, to successfully carry out daily tasks. Giving robots an analogous ability to perceive and process information from different sensory modalities enables a richer understanding of the physical...