Training Sparse High Capacity Models with Implicit Neural Networks and Frank-Wolfe

Reducing parameter footprint and inference latency of machine learning models is being driven by diverse applications like mobile vision and on-device intelligence [Choudary 20], and it is increasingly important, as models become increasingly large. In this work, we propose to develop an alternative to the current train/compress paradigm, and instead we will train sparse high-capacity models from scratch, simultaneously achieving low training cost and high sparsity.

Researchers

Geoffrey Négiar, UC Berkeley
Michael Mahoney, UC Berkeley
Laurent El Ghaoui, UC Berkeley
Brandon Amos, FAIR
Aaron Defazio, FAIR

Overview

To achieve sparse model training, we combine two threads of research: Implicit Neural Networks [Négiar 17, Askari 18, Gu 18, El Ghaoui 19]; and Frank-Wolfe (FW) algorithms for constrained optimization [Pedregosa 18, Négiar 20]. Implicit models are a novel class of machine learning models which bridge deep learning and control theory, using implicit optimization [Amos 17, Agrawal 19] to encompass both. Implicit Neural Networks are a superset of deep learning models, in which we allow the computation graph between “neurons” to have loops. To be well-defined (and avoid exploding gradients), they are naturally defined using constraints, reminiscent of control theory stability conditions. They reach state of the art accuracy on deep learning benchmark problems, and they have become a topic of interest in the optimization and machine learning communities [Bai 19]. The FW algorithm allows for controlling model complexity by initializing with all-zero weights, and adding sparse or low rank weights [Jaggi 13] at each iteration. It suits well the constrained nature of the Implicit Models.

We combine the strengths of our Implicit model family and our expertise of the Frank-Wolfe method to obtain accurate and light models ready to go at the end of their training.

References

Jaggi, M.. (2013). Revisiting Frank-Wolfe: Projection-Free Sparse Convex OptimizationICMLNégiar, G., Askari A., Pedregosa P. & El Ghaoui, L. (2017) Lifted Neural Networks for Weight InitializationOPT-MLWorkshop at NeurIPS 2017.
Amos, B., & Kolter, J.Z. (2017). OptNet: Differentiable Optimization as a Layer in Neural NetworksICML 2017.Askari, A.*, Négiar, G.*, Sambharya, R., & El Ghaoui, L. (2018). Lifted Neural NetworksArXiv, abs/1805.01532.Gu, F., Askari, A., & El Ghaoui, L. (2018). Fenchel Lifted Networks: A Lagrange Relaxation of Neural NetworkTrainingAISTATS 2020
Pedregosa, F., Négiar, G., Askari, A., & Jaggi, M. (2018). Linearly Convergent Frank-Wolfe with BacktrackingLine-SearchAISTATS 2020
El Ghaoui, L., Gu, F., Travacca, B., & Askari, A. (2019). Implicit Deep LearningArXiv, abs/1908.06315Dong, Z., Yao, Z., Cai, Y., Arfeen, D., Gholami, A., Mahoney, M.W., & Keutzer, K. (2019). HAWQ-V2: Hessian Awaretrace-Weighted Quantization of Neural NetworksArXiv, abs/1911.03852
Bai, S., Kolter, J.Z., & Koltun, V. (2019). Deep Equilibrium ModelsNeurIPS 2019
Agrawal, A., Amos, B., Barratt, S.T., Boyd, S., Diamond, S., & Kolter, J.Z. (2019). Differentiable Convex OptimizationLayersNeurIPS 2019
Shen, S., Dong, Z., Ye, J., Ma, L., Yao, Z., Gholami, A., Mahoney, M.W., & Keutzer, K. (2019). Q-BERT: Hessian BasedUltra Low Precision Quantization of BERTAAAI 2020
Négiar, G., Dresdner, G., Tsai, A., El Ghaoui, L., Locatello, F., Freund, R.M. & Pedregosa, F. (2020). StochasticFrank-Wolfe for Constrained Finite-Sum MinimizationICML 2020
Choudhary, T., Mishra, V., Goswami, A. & Sarangapani, J. (2020) A comprehensive survey on model compressionand accelerationArtificial Intelligence Review 2020

Training Sparse High Capacity Models with Implicit Neural Networks and Frank-Wolfe

Researchers

Overview

Links

References

Topics