Unsupervised Learning of Visual Context from Instance Segmentation

Unsupervised representation learning aims to extract latent information from data that reflects their semantic categories. Contrastive learning has emerged as a direct winning alternative to self-supervised learning. We take a step further and propose to extract latent information that also reflects their visual context.


Instead of learning from annotated object relationships (persons riding on a motocycle), we leverage our expertise on unsupervised image classification [1, 2], segmentation[3, 4, 5], and open long-tailed recognition [6, 7], bootstrap from instance segmentation, apply multi-level contrastive learning in order to discover and encode visual context automatically in the feature space.  We would then be able to use the feature of an instance segmentation to retrieve instance segmentations of similar contexts ( back view of a rider on a motorcycle next to a car).


Updated 09/15/2021


  • Tsung-Wei Ke, UC Berkeley, link.

  • Stella Yu, UC Berkeley, link.

  • Alex Berg. Facebook, link. 


[1] Z. Wu,  Y.  Xiong,  S.  X.  Yu,  and  D.  Lin.  Unsupervised feature learning via non-parametric instance discrimination. In CVPR, 2018.

[2] X. Wang, Z. Liu, and S. X. Yu.  Unsupervised feature learning by cross-level discrimination between instances and groups. In CVPR, 2021.

[3] J.-J. Hwang, S. X. Yu, J. Shi, M. D. Collins, T.-J. Yang, X. Zhang, and L.-C. Chen. Segsort: Segmentation by discriminative sorting of segments. In ICCV, 2019.

[4] T.-W. K. J.-J. Hwang and S. Yu. Contextual image parsing via panoptic segment sorting.  In ACM Multimedia 2021 Workshop on Multimedia Understanding with Less Labeling, 2021.

[5] T.-W. K. J.-J. Hwang and S. X. Yu. Universal Weakly Supervised Segmentation by Pixel-to-Segment Contrastive Learning.  In ICLR, 2021.

[6] Z. Liu, Z. Miao, X. Pan, X. Zhan, D. Lin, S. X. Yu, and B. Gong. Open compound domain adaptation. In CVPR, 2020.

[7] Z. Liu, Z. Miao, X. Zhan, J. Wang, B. Gong, and S. X. Yu. Large-scale long-tailed recognition in an openworld. In CVPR, 2019.