We propose a new method for LUSS, namely PASS, containing four steps. 1) A randomly initialized model is trained with self-supervision of pretext tasks (i.e. our proposed Non-contrastive pixel-to-pixel representation alignment and Deep-to-shallow supervision) to learn shape and category representations. After representation learning, we obtain the features set for all training images. 2) We then apply a pixel-attention-based clustering scheme to obtain pseudo categories and assign generated categories to each image pixel. 3) We fine-tune the pre-trained model with the generated pseudo labels to improve the segmentation quality. 4) During inference, the LUSS model assigns generated labels to each pixel of images, same to the supervised model.
More details about the LUSS task and ImageNet-S dataset are in project page and paper link.
We give the training and inference details in USAGE.
Fully Unsupervised Evaluation Protocol
Dataset | Arch | Val | Test | Args | Pretrained | Pixelatt | Centroid | Finetuned |
---|---|---|---|---|---|---|---|---|
ImageNet-S | ResNet50 | 11.5 | 11.0 | bash | model | model | centroid | model |
ImageNet-S 300 | ResNet50 | 18.0 | 18.1 | bash | model | model | centroid | model |
ImageNet-S 50 | ResNet18 | 29.2 | 29.3 | bash | model | model | centroid | model |
Distance Matching Evaluation Protocol
Dataset | Arch | Val | Test | Pretrained |
---|---|---|---|---|
ImageNet-S | ResNet50 | 15.6 | 15.6 | model |
ImageNet-S 300 | ResNet50 | 25.1 | 25.2 | model |
ImageNet-S 50 | ResNet18 | 39.6 | 40.4 | model |
Semi-supervised Evaluation Protocol
We provide a new codebase for semi-supervised evaluation in ImageNetSegModel.
@article{gao2022luss,
title={Large-scale Unsupervised Semantic Segmentation},
author={Gao, Shanghua and Li, Zhong-Yu and Yang, Ming-Hsuan and Cheng, Ming-Ming and Han, Junwei and Torr, Philip},
journal=TPAMI,
year={2022}
}
This codebase is build based on the SwAV codebase.
If you have any other question, open an issue or email us via [email protected]