Self-Supervised Learning (SSL) techniques have proved to be quite effective for representation learning in multiple modalities like text, image, and more recently speech. In the speech domain, SSL approaches for pretraining have resulted in state-of-the art demonstrations in several downstream applications like speech recognition (WAV2VEC, WAV2VEC2.0), spoken language modeling (GSLM), speech resynthesis (HuBERT) etc. As such this approach requires massive amounts of speech data (thousands of hours of speech) and computational resources to train such large models. Also, while...