Deep Spectral Methods: A Surprisingly Strong Baseline for Unsupervised Semantic Segmentation and Localization

    Luke Melas-Kyriazi
    Oxford University

Abstract

Unsupervised localization and segmentation are long-standing computer vision challenges that involve decomposing an image into semantically-meaningful segments without any labeled data. These tasks are particularly interesting in an unsupervised setting due to the difficulty and cost of obtaining dense image annotations, but existing unsupervised approaches struggle with complex scenes containing multiple objects. Differently from existing methods, which are purely based on deep learning, we take inspiration from traditional spectral segmentation methods by reframing image decomposition as a graph partitioning problem. Specifically, we examine the eigenvectors of the Laplacian of a feature affinity matrix from self-supervised networks. We find that these eigenvectors already decompose an image into meaningful segments, and can be readily used to localize objects in a scene. Furthermore, by clustering the features associated with these segments across a dataset, we can obtain well-delineated, nameable regions, i.e. semantic segmentations. Experiments on complex datasets (Pascal VOC, MS-COCO) demonstrate that our simple spectral method outperforms the state-of-the-art in unsupervised localization and segmentation by a significant margin. Furthermore, our method can be readily used for a variety of complex image editing tasks, such as background removal and compositing.

Examples

We present a simple approach based on spectral methods that decomposes an image using the eigenvectors of a Laplacian matrix constructed from a combination of color information and unsupervised deep features. Above, we show examples of these eigenvectors along with the results of our method on unsupervised object localization and semantic segmentation.

Method

Our method first utilizes a self-supervised network to extract dense features corresponding to image patches. We then construct a weighted graph over patches, where edge weights give the semantic affinity of pairs of patches, and we consider the eigendecomposition of this graph's Laplacian matrix. We find that without imposing any additional structure, the eigenvectors of the Laplacian of this graph directly correspond to semantically meaningful image regions.

Citation

@inproceedings{
    melaskyriazi2022deep,
    title={Deep Spectral Methods: A Surprisingly Strong Baseline for Unsupervised Semantic Segmentation and Localization}
    author={Luke Melas-Kyriazi and Christian Rupprecht and Iro Laina and Andrea Vedaldi}
    year={2022}
    booktitle={CVPR}
}

Acknowledgements

L. M. K. acknowledges the generous support of the Rhodes Trust. C. R. is supported by Innovate UK (project 71653) on behalf of UK Research and Innovation (UKRI) and by the European Research Council (ERC) IDIU-638009. I. L. and A. V. are supported by the VisualAI EPSRC programme grant (EP/T028572/1).