Robustness is steadily becoming a real concern for machine learning models, especially when deployed in security or safety-critical settings. Significant research has demonstrated the fragility of neural networks when the i.i.d. assumption does not hold, e.g., natural corruptions and adversarial perturbation. Many application domains are already at risk such as autonomous cars, malware detection, and copyright detection. Today, most of the recent methods for training robust models rely on expensive and aggressive data augmentation as well as many forms of pre-training. Nevertheless, the current state-of-the-art techniques still do not scale and yield limited robustness improvements.
We believe that a more directed inductive bias is necessary to efficiently learn robust and human-aligned representations. Contrary to existing work that applies fewer and fewer priors, we argue that selectively adding a few important priors can give significant robustness benefits. We propose a new paradigm of robust visual models that has two steps: (1) learn a disentangled and semantically-aligned representation space, and (2) train a classifier on these representations. Our preliminary design of the system involves part-based disentanglement for representation learning and a robust classifier for classification.
Acknowledgment
This project is supported by Google through Google cloud credits.