Dynamic Compression Techniques for Efficient Transformers

Abstract

Transformers are a class of deep neural networks that have achieved state-of-the-art results across a wide range of domains, including natural language processing, computer vision, and computational biology. The widespread success of these models has been attributed to the attention mechanism, which identifies complex dependencies between elements of each input sequence. While the attention mechanism is incredibly effective at processing sequential data, it scales quadratically with respect to the length of the input sequence, which has a number of consequences. In recent years, several techniques have been proposed to create more efficient Transformer models that can handle long input sequences without incurring high computational costs.

In our research, we aim to make Transformer-based models more computationally efficient, as well as make them viable for applications involving long sequences of data. In particular, we are motivated by code completion and code translation as an application, which not only requires longer sequences than natural language, but also puts a constraint on inference time.

Researchers

Karna Mendonca, UC Berkeley
Matteo Guarrera, UC Berkeley
Mostafa Elhoushi, Meta AI
Chunxing Yin, Meta AI
Syed Shakib Sarwar, Meta AI
Kannan Ramchandran, UC Berkeley
Alberto Sangiovanni-Vincentelli, UC Berkeley

Acknowledgements

This project is in part based upon work sponsored by Meta AI.

Updates

[to be added] Closing Report.

Dynamic Compression Techniques for Efficient Transformers

Abstract

Researchers

Acknowledgements

Updates

Topics