Towards Learning and Auditing Private Foundation Models


Foundation models (e.g., DALL-E, GPT-3, CLIP, MAE) – pre-trained on vast amounts of diverse data through self-supervised learning – have emerged as an important building block for artificial intelligence (AI) systems [BHA+2021]. These models can be simply adapted to various downstream applications (e.g., language, vision, robotics) via fine-tuning, prompting, linear probing, etc. Despite foundation models having been extensively deployed, there is a significant lack of understanding regarding the privacy risks associated with training foundation models on sensitive user data,  and whether it is possible to train foundation models that safeguard privacy. 


  • BAIR Faculty Member: Yi Ma, UC Berkeley
  • BAIR PhD student/postdoc: Yaodong Yu, UC Berkeley
  • Meta-AI researcher: Chuan Guo, Fundamental AI Research (FAIR) team at Meta


Existing work demonstrates that machine learning models trained on sensitive private data can lead to unintentional leakage of their training data [SSS+2017, BCH2022]. More recently, researchers have shown that foundation models leak training data details [CTW+2021, SSG+2021, CHN+2023], including extracting SSN information from GPT-2 [CTW+2021], as well as reconstructing training images from diffusion models [SSG+2021, CHN+2023]. Moreover, the capacity of models, which refers to the number of parameters, has a detrimental effect on privacy due to increased model memorization, resulting in greater privacy leakage [CTW+2021]. However, these studies only scratch the surface of the types of privacy leakage that are possible from foundation models, and further research is needed to understand the full extent. 

On the flip side, although foundation models pose privacy risks, there is currently a lack of effort in developing privacy-preserving foundation models. Limiting our scope to differential privacy (DP)—the only theoretically rigorous notion of privacy in ML at this time—most existing works either train differential private models on small-scale datasets [TB2020, DBH+2022], or fine-tune non-private foundation models for downstream tasks via differential private SGD [LTL+2021, DBH+2022].  It is unclear whether DP training/fine-tuning suffices for training privacy-preserving foundation models, or other techniques such as private prediction [DF2018], privacy auditing [JUO2020], and alternative definitions of privacy [HGvdM2021] are needed to advance the field forward.

In this project, we take a two-pronged approach that aims to tackle the privacy threat of foundation models from both sides: (1). We aim to push the limit of DP training by scaling up existing techniques to billion-scale datasets and examine their privacy-utility trade-off at scale. (2). We aim to design effective privacy auditing tools that extract information from foundation models. 


[BCH2022] Balle, B., Cherubin, G. and Hayes, J., 2022, May. Reconstructing training data with informed adversaries. In 2022 IEEE Symposium on Security and Privacy (SP) (pp. 1138-1156). IEEE.

[CTW+2021] Carlini, N., Tramer, F., Wallace, E., Jagielski, M., Herbert-Voss, A., Lee, K., Roberts, A., Brown, T.B., Song, D., Erlingsson, U. and Oprea, A., 2021, August. Extracting Training Data from Large Language Models. In USENIX Security Symposium (Vol. 6).

[SSS+2017] Shokri, R., Stronati, M., Song, C. and Shmatikov, V., 2017, May. Membership inference attacks against machine learning models. In 2017 IEEE symposium on security and privacy (SP) (pp. 3-18). IEEE.

[BHA+2021] Bommasani, R., Hudson, D.A., Adeli, E., Altman, R., Arora, S., von Arx, S., Bernstein, M.S., Bohg, J., Bosselut, A., Brunskill, E. and Brynjolfsson, E., 2021. On the opportunities and risks of foundation models. arXiv preprint arXiv:2108.07258.

[CHN+2023] Carlini, N., Hayes, J., Nasr, M., Jagielski, M., Sehwag, V., Tramèr, F., Balle, B., Ippolito, D. and Wallace, E., 2023. Extracting training data from diffusion models. arXiv preprint arXiv:2301.13188.

[SSG+2021] Somepalli, G., Singla, V., Goldblum, M., Geiping, J. and Goldstein, T., 2022. Diffusion Art or Digital Forgery? Investigating Data Replication in Diffusion Models. arXiv preprint arXiv:2212.03860.

[HCX+2022] He, K., Chen, X., Xie, S., Li, Y., Dollár, P. and Girshick, R., 2022. Masked autoencoders are scalable vision learners. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 16000-16009).

[TB2020] Tramer, F. and Boneh, D., 2020. Differentially private learning needs better features (or much more data). arXiv preprint arXiv:2011.11660.

[DBH+2022] De, S., Berrada, L., Hayes, J., Smith, S.L. and Balle, B., 2022. Unlocking high-accuracy differentially private image classification through scale. arXiv preprint arXiv:2204.13650.

[LTL+2021] Li, X., Tramer, F., Liang, P. and Hashimoto, T., 2021. Large language models can be strong differentially private learners. arXiv preprint arXiv:2110.05679.

[ACG+2016] Abadi, M., Chu, A., Goodfellow, I., McMahan, H.B., Mironov, I., Talwar, K. and Zhang, L., 2016, October. Deep learning with differential privacy. In Proceedings of the 2016 ACM SIGSAC conference on computer and communications security (pp. 308-318).

[GKC+2022] Guo, C., Karrer, B., Chaudhuri, K. and van der Maaten, L., 2022, June. Bounding training data reconstruction in private (deep) learning. In International Conference on Machine Learning (pp. 8056-8071). PMLR.

[HMB2023] Hayes, J., Mahloujifar, S. and Balle, B., 2023. Bounding Training Data Reconstruction in DP-SGD. arXiv preprint arXiv:2302.07225.

[JUO2020] Jagielski, M, Ullman, J. and Oprea A., 2020. Auditing Differentially Private Machine Learning: How Private is Private SGD? NeurIPS 2020.

[DF2018] Dwork, C. and Feldman, V., 2018. Privacy-preserving Prediction. COLT 2018.

[HGvdM2021] Hannun, A., Guo, C. and van der Maaten, L. Measuring Data Leakage in Machine-Learning Models with Fisher Information. UAI 2021.