Alvin Cheung

Pretraining Tree-structured Large Language Models for Codes

Large language models (LLMs) have made significant advancements in code-related tasks, yet many LLMs treat code as simple sequences, neglecting its structured nature. We introduce AST-T5, a novel pretraining paradigm that leverages the Abstract Syntax Tree (AST) for enhanced code generation, transpilation, and understanding. Using dynamic programming, our AST-Aware Segmentation retains code structure, while our AST-Aware Span Corruption objective equips the model to reconstruct various code structures. Unlike other models, AST-T5 avoids intricate program analyses or architectural changes,...

Papaya: An Evaluation of Memory Optimization Strategies for Model Training

Overview

Large neural network models have improved accuracy and generalization in various domains. However, the trend cannot continue indefinitely due to limited hardware memory. As a result, researchers have devised a number of memory saving algorithms to alleviate the memory bottleneck, such as checkpointing, quantization, and swapping.

In this project, we first conduct a case study using the Bert model to see how effective such memory saving solutions are.
Surprisingly, we find that although these strategies indeed lower peak memory usage, the associated overhead (e.g.,...