Disentangling Input Signals for Robust Computer Vision

High-dimensional real world imagery presents an embarrassment of riches to powerful, overparameterized neural networks; it is possible to train image classification models to surprising levels of accuracy on high-frequency or low-frequency features alone. The dominant paradigm of training and evaluating deep neural networks on independent and identically distributed (IID) data splits has obscured a significant weakness of current models, namely a lack of robustness to distribution shifts. One class of explanations posits that powerful models can learn spurious correlations, including a bias toward texture and other surface statistics which may not hold under shifts in the testing distribution.


August 24, 2021



Recently, aggressive data augmentation methods such as style-transfer from artwork [1] and compositions of simple augmentations [2] have found success in improving robustness to distribution shift for image classification.  Our work seeks to develop new training methods which are able to disentangle the rich input signals within an image and leverage these signals simultaneously to perform more robust inference.




[1]: R. Geirhos, P. Rubisch, C. Michaelis, M. Bethge, F. A. Wichmann, and W. Brendel. Imagenet-trained cnns are biased towards texture; increasing shape bias improves accuracy and robustness. arXiv preprint arXiv:1811.12231, 2018.

[2]: D. Hendrycks, N. Mu, E. D. Cubuk, B. Zoph, J. Gilmer, and B. Lakshminarayanan. Augmix: A simple data processing method to improve robustness and uncertainty. arXiv preprint arXiv:1912.02781, 2019.