Reducing parameter footprint and inference latency of machine learning models is being driven by diverse applications like mobile vision and on-device intelligence [Choudary 20], and it is increasingly important, as models become increasingly large. In the proposed work, we will develop an alternative to the current train/compress paradigm, and instead we will train sparse high-capacity models from scratch, simultaneously achieving low training cost and high sparsity. We will also explore the robustness of such models.
To do so, we are building an optimization...