Overview
Large neural network models have improved accuracy and generalization in various domains. However, the trend cannot continue indefinitely due to limited hardware memory. As a result, researchers have devised a number of memory saving algorithms to alleviate the memory bottleneck, such as checkpointing, quantization, and swapping.
In this project, we first conduct a case study using the Bert model to see how effective such memory saving solutions are.
Surprisingly, we find that although these strategies indeed lower peak memory usage, the associated overhead (e.g.,...