Hardware Software Co-Design for NLP and Recommendation Systems

This project investigates the co-design of Deep Neural Nets and their hardware support in Neural Net Accelerators.

Project Updates

Project updates


  • Joseph Hassoun,  Sr. Director Neural Processor Architecture, Samsung LinkedIn

  • Sheng Shen, University of California, Berkeley, Personal Page 

  • Professor Kurt Keutzer University of California, Berkeley, Personal Page


Concerns for user privacy coupled with the desirability of low-latency  and reliability are creating an interest to move Natural Language Understanding and even Recommendation systems to deployment on edge hardware. Because NLU and RecSys models are very large, significant innovation will be required to create suitable models for the edge. Our exploration of this topic will have three elements:

  • The design of novel Deep Neural Nets for NLU and RecSys, particularly tailored for edge deployment

  • The further development of aggressive DNN optimization methods, particularly for quantization, distillation, and pruning. 

  • The co-design of Neural Net Accelerator hardware that is highly tuned to the requirements of these DNNs. 

This line of work will leverage our existing research in each of these areas, particularly further evolution of our SqueezeBERT [SqueezeBERT] model for NLU, as well as our quantization efforts in Q-BERT [Q-BERT], and recent developments on HAWQ [HAWQ-V2].  Investigation of optimized hardware support for these models will be an entirely new avenue of research. 


[PowerNorm] Shen S, Yao Z, Gholami A, Mahoney M, Keutzer K. Rethinking Batch Normalization in Transformers. ICML’20. Also, arXiv:2003.07845. 2020.

[Q-BERT] Sheng Shen, Zhen Dong, Jiayu Ye, Linjian Ma, Zhewei Yao, Amir Gholami, Michael W. Mahoney, Kurt Keutzer. Q-BERT: Hessian Based Ultra Low Precision Quantization of BERT. AAAI Conference on Artificial Intelligence (AAAI) 2020. Also, arXiv: 1909.05840, 2019.

[SqueezeBERT] Iandola FN, Shaw AE, Krishna R, Keutzer KW. SqueezeBERT: What can computer vision teach NLP about efficient neural networks? SustainNLP Workshop 2020. Also, arXiv preprint arXiv:2006.11316. 2020 Jun 19.

[HAWQ-V2] Zhen Dong, Zhewei Yao, Yaohui Cai, Daiyaan Arfeen, Amir Gholami, Michael W. Mahoney, Kurt Keutzer. HAWQ-V2: Hessian Aware trace-Weighted Quantization of Neural Networks. NeurIPS 2020. Also, arXiv: 1911.03852, 2019.