Anant Sahai

Data Curation for Web-Scale Datasets

Abstract

Data curation is a promising direction for improving the efficiency and performance of large-scale models. Current efforts towards curation are ad-hoc and disconnected. We propose to develop new principled approaches for data curation inspired by Sorscher et al...