Secure and Privacy-Preserving Federated Learning

Threat model in federated learning

Federated learning (FL) proposes a powerful new distributed learning paradigm and has grown as an active research field with large-scale real-world deployment in the last several years. In FL, participants collaboratively train a model when all the data is held locally to preserve data privacy. Despite its success, FL still faces a variety of security challenges, among which inference attacks and poisoning attacks are the most notable two categories.How to ensure privacy and model integrity under these attacks remains an open question of critical importance.

We propose to further explore inference attacks and poisoning attacks in FL and design countermeasures. Specifically, we plan to explore the attack landscape under novel and stronger threat models, for example, malicious participants for inference attacks, and design and develop new defense approaches against these attacks.

Researchers

Overview

The technical objective is two-fold. First, we plan to further explore the landscape of new attacks on FL under novel threat models. Second, we plan to design improved defenses for both the existing attacks and the proposed attacks.

Attacks. In inference attacks, the adversary aims to infer information about the participants’ data (e.g., membership, feature) by querying or inspecting the published FL model and gradient updates. In poisoning attacks, participants upload contaminated updates to either prevent the training process from converging or to insert backdoor functionality into the FL model. Differential privacy and Byzantine-robust aggregators had been proposed for defending against these attacks separately. 

However, in a stronger threat model with malicious participants, the two attacks can be combined as one. Concretely, an adversary can poison the training process by manipulating its own update to aid the subsequent inference attack. For example, an adversary may manipulate its update such that if a certain data point is involved in this round by others, the two updates will together plant a backdoor into the FL model. Afterwards, the adversary only needs to activate the backdoor to infer whether the other data point is used in training.  

Defenses. To design effective and efficient defenses, we plan to focus on the following goals.

  • Privacy: Privacy of the participants’ data under inference attack.

  • Utility: The utility of the output (e.g. accuracy of a FL model, MSE of an estimated mean). The utility is impacted by two aspects: (1) the intrinsic utility loss of the protocol (e.g. compression for lower communication cost, noise for differential privacy); (2) the influence of the malicious updates in poisoning attacks. Hence, the utility captures the Byzantine-robustness requirement.

  • Communication Cost & Computation Cost: The communication and computation cost of the protocol should be as low as possible for efficiency. Sub-linear, linear or close-to-linear complexity in terms of the update dimension and the number of participants is preferred..

To defend against the aforementioned attacks, we plan to design a FL protocol combining both Byzantine-robustness and differential privacy. We have done preliminary work in designing new approaches for differentially private FL and Byzantine-robust FL. We plan to further explore these directions for new defenses. Furthermore, we also plan to explore the connection and separation between differential privacy and robustness in FL as an extension of their connection in statistics, adversarial learning, and streaming algorithms.

This project is generously supported by compute resources from Azure.