Compressing High Capacity Models with Implicit Neural Networks and Frank-Wolfe

Reducing parameter footprint and inference latency of machine learning models is being driven by diverse applications like mobile vision and on-device intelligence [Choudary 20], and it is increasingly important, as models become increasingly large. In the proposed work, we will develop an alternative to the current train/compress paradigm, and instead we will train sparse high-capacity models from scratch, simultaneously achieving low training cost and high sparsity. We will also explore the robustness of such models.

To do so, we are building an optimization library built on PyTorch, which will contain both stochastic and full-batch optimization methods, both constrained and unconstrained.

Link: https://github.com/openopt/chop

Researchers

  • Geoffrey Négiar, UC Berkeley
  • Fabian Pedregosa, Google Research
  • Laurent El Ghaoui, UC Berkeley

Overview

To achieve sparse model training, we will combine two threads of research: Implicit Neural Networks [Négiar 17, Askari 18, Gu 18, El Ghaoui 19]; and Frank-Wolfe (FW) algorithms for constrained optimization [Pedregosa 18, Négiar 20]. Implicit models are a novel class of machine learning models which bridge deep learning and control theory, using implicit optimization [Amos 17, Agrawal 19] to encompass both. Implicit Neural Networks are a superset of deep learning models, in which we allow the computation graph between “neurons” to have loops. To be well-defined (and avoid exploding gradients), they are naturally defined using constraints, reminiscent of control theory stability conditions. They reach state of the art accuracy on deep learning benchmark problems, and they have become a topic of interest in the optimization and machine learning communities [Bai 19]. The FW algorithm allows for controlling model complexity by initializing with all-zero weights, and adding sparse or low rank weights [Jaggi 13] at each iteration. Our novel stochastic FW version [Négiar 20] is fast, and it converges without hyperparameter tuning. The FW algorithm suits well the constrained nature of the Implicit Models. To accomplish this, we will combine the strengths of our Implicit model family and our expertise of the Frank-Wolfe method to obtain accurate and light models ready to go at the end of their training.

To do so, we are building an optimization library built on PyTorch designed for both stochastic algorithms and full-batch algorithms, in both the constrained and unconstrained settings. This will allow us to examine performance of training models under constraints, using sparsity promoting methods. Our full-batch optimization algorithms will allow us to explore model robustness: the most efficient adversarial attacks are often posed as the solution of optimization problems.

In conclusion, we wish to propose a scipy.optimize like library implementing a host of state-of-the-art optimization methods, which can be used at various stages of the deep learning pipeline : for training models, evaluating their robustness, and training them in an adversarial fashion.

Links