Pyramid Scene Parsing Network
Official Repo
Code Snippet
Scene parsing is challenging for unrestricted open vocabulary and diverse scenes. In this paper, we exploit the capability of global context information by different-region-based context aggregation through our pyramid pooling module together with the proposed pyramid scene parsing network (PSPNet). Our global prior representation is effective to produce good quality results on the scene parsing task, while PSPNet provides a superior framework for pixel-level prediction tasks. The proposed approach achieves state-of-the-art performance on various datasets. It came first in ImageNet scene parsing challenge 2016, PASCAL VOC 2012 benchmark and Cityscapes benchmark. A single PSPNet yields new record of mIoU accuracy 85.4% on PASCAL VOC 2012 and accuracy 80.2% on Cityscapes.
PSPNet-R50 D8 model structure
Method
Backbone
Crop Size
Lr schd
Mem (GB)
Inf time (fps)
Device
mIoU
mIoU(ms+flip)
config
download
PSPNet
R-50-D8
512x1024
40000
6.1
4.07
V100
77.85
79.18
config
model | log
PSPNet
R-101-D8
512x1024
40000
9.6
2.68
V100
78.34
79.74
config
model | log
PSPNet
R-50-D8
769x769
40000
6.9
1.76
V100
78.26
79.88
config
model | log
PSPNet
R-101-D8
769x769
40000
10.9
1.15
V100
79.08
80.28
config
model | log
PSPNet
R-18-D8
512x1024
80000
1.7
15.71
V100
74.87
76.04
config
model | log
PSPNet
R-50-D8
512x1024
80000
-
-
V100
78.55
79.79
config
model | log
PSPNet
R-50b-D8 rsb
512x1024
80000
6.2
3.82
V100
78.47
79.45
config
model | log
PSPNet
R-101-D8
512x1024
80000
-
-
V100
79.76
81.01
config
model | log
PSPNet (FP16)
R-101-D8
512x1024
80000
5.34
8.77
V100
79.46
-
config
model | log
PSPNet
R-18-D8
769x769
80000
1.9
6.20
V100
75.90
77.86
config
model | log
PSPNet
R-50-D8
769x769
80000
-
-
V100
79.59
80.69
config
model | log
PSPNet
R-101-D8
769x769
80000
-
-
V100
79.77
81.06
config
model | log
PSPNet
R-18b-D8
512x1024
80000
1.5
16.28
V100
74.23
75.79
config
model | log
PSPNet
R-50b-D8
512x1024
80000
6.0
4.30
V100
78.22
79.46
config
model | log
PSPNet
R-101b-D8
512x1024
80000
9.5
2.76
V100
79.69
80.79
config
model | log
PSPNet
R-18b-D8
769x769
80000
1.7
6.41
V100
74.92
76.90
config
model | log
PSPNet
R-50b-D8
769x769
80000
6.8
1.88
V100
78.50
79.96
config
model | log
PSPNet
R-101b-D8
769x769
80000
10.8
1.17
V100
78.87
80.04
config
model | log
PSPNet
R-50-D32
512x1024
80000
3.0
15.21
V100
73.88
76.85
config
model | log
PSPNet
R-50b-D32 rsb
512x1024
80000
3.1
16.08
V100
74.09
77.18
config
model | log
PSPNet
R-50b-D32
512x1024
80000
2.9
15.41
V100
72.61
75.51
config
model | log
Method
Backbone
Crop Size
Lr schd
Mem (GB)
Inf time (fps)
Device
mIoU
mIoU(ms+flip)
config
download
PSPNet
R-50-D8
512x512
80000
8.5
23.53
V100
41.13
41.94
config
model | log
PSPNet
R-101-D8
512x512
80000
12
15.30
V100
43.57
44.35
config
model | log
PSPNet
R-50-D8
512x512
160000
-
-
V100
42.48
43.44
config
model | log
PSPNet
R-101-D8
512x512
160000
-
-
V100
44.39
45.35
config
model | log
Method
Backbone
Crop Size
Lr schd
Mem (GB)
Inf time (fps)
Device
mIoU
mIoU(ms+flip)
config
download
PSPNet
R-50-D8
512x512
20000
6.1
23.59
V100
76.78
77.61
config
model | log
PSPNet
R-101-D8
512x512
20000
9.6
15.02
V100
78.47
79.25
config
model | log
PSPNet
R-50-D8
512x512
40000
-
-
V100
77.29
78.48
config
model | log
PSPNet
R-101-D8
512x512
40000
-
-
V100
78.52
79.57
config
model | log
Method
Backbone
Crop Size
Lr schd
Mem (GB)
Inf time (fps)
Device
mIoU
mIoU(ms+flip)
config
download
PSPNet
R-101-D8
480x480
40000
8.8
9.68
V100
46.60
47.78
config
model | log
PSPNet
R-101-D8
480x480
80000
-
-
V100
46.03
47.15
config
model | log
Method
Backbone
Crop Size
Lr schd
Mem (GB)
Inf time (fps)
Device
mIoU
mIoU(ms+flip)
config
download
PSPNet
R-101-D8
480x480
40000
-
-
V100
52.02
53.54
config
model | log
PSPNet
R-101-D8
480x480
80000
-
-
V100
52.47
53.99
config
model | log
Dark Zurich and Nighttime Driving
We support evaluation results on these two datasets using models above trained on Cityscapes training set.
Method
Backbone
Training Dataset
Test Dataset
mIoU
config
evaluation checkpoint
PSPNet
R-50-D8
Cityscapes Training set
Dark Zurich
10.91
config
model | log
PSPNet
R-50-D8
Cityscapes Training set
Nighttime Driving
23.02
config
model | log
PSPNet
R-50-D8
Cityscapes Training set
Cityscapes Validation set
77.85
config
model | log
PSPNet
R-101-D8
Cityscapes Training set
Dark Zurich
10.16
config
model | log
PSPNet
R-101-D8
Cityscapes Training set
Nighttime Driving
20.25
config
model | log
PSPNet
R-101-D8
Cityscapes Training set
Cityscapes Validation set
78.34
config
model | log
PSPNet
R-101b-D8
Cityscapes Training set
Dark Zurich
15.54
config
model | log
PSPNet
R-101b-D8
Cityscapes Training set
Nighttime Driving
22.25
config
model | log
PSPNet
R-101b-D8
Cityscapes Training set
Cityscapes Validation set
79.69
config
model | log
Method
Backbone
Crop Size
Lr schd
Mem (GB)
Inf time (fps)
Device
mIoU
mIoU(ms+flip)
config
download
PSPNet
R-50-D8
512x512
20000
9.6
20.5
V100
35.69
36.62
config
model | log
PSPNet
R-101-D8
512x512
20000
13.2
11.1
V100
37.26
38.52
config
model | log
PSPNet
R-50-D8
512x512
40000
-
-
V100
36.33
37.24
config
model | log
PSPNet
R-101-D8
512x512
40000
-
-
V100
37.76
38.86
config
model | log
Method
Backbone
Crop Size
Lr schd
Mem (GB)
Inf time (fps)
Device
mIoU
mIoU(ms+flip)
config
download
PSPNet
R-50-D8
512x512
80000
9.6
20.5
V100
38.80
39.19
config
model | log
PSPNet
R-101-D8
512x512
80000
13.2
11.1
V100
40.34
40.79
config
model | log
PSPNet
R-50-D8
512x512
160000
-
-
V100
39.64
39.97
config
model | log
PSPNet
R-101-D8
512x512
160000
-
-
V100
41.28
41.66
config
model | log
PSPNet
R-50-D8
512x512
320000
-
-
V100
40.53
40.75
config
model | log
PSPNet
R-101-D8
512x512
320000
-
-
V100
41.95
42.42
config
model | log
Method
Backbone
Crop Size
Lr schd
Mem (GB)
Inf time (fps)
Device
mIoU
mIoU(ms+flip)
config
download
PSPNet
R-18-D8
512x512
80000
1.45
26.87
V100
48.62
47.57
config
model | log
PSPNet
R-50-D8
512x512
80000
6.14
6.60
V100
50.46
50.19
config
model | log
PSPNet
R-101-D8
512x512
80000
9.61
4.58
V100
51.86
51.34
config
model | log
Method
Backbone
Crop Size
Lr schd
Mem (GB)
Inf time (fps)
Device
mIoU
mIoU(ms+flip)
config
download
PSPNet
R-18-D8
512x512
80000
1.50
85.12
V100
77.09
78.30
config
model | log
PSPNet
R-50-D8
512x512
80000
6.14
30.21
V100
78.12
78.98
config
model | log
PSPNet
R-101-D8
512x512
80000
9.61
19.40
V100
78.62
79.47
config
model | log
Method
Backbone
Crop Size
Lr schd
Mem (GB)
Inf time (fps)
Device
mIoU
mIoU(ms+flip)
config
download
PSPNet
R-18-D8
512x512
80000
1.45
85.06
V100
71.46
73.36
config
model | log
PSPNet
R-50-D8
512x512
80000
6.14
30.29
V100
72.36
73.75
config
model | log
PSPNet
R-101-D8
512x512
80000
9.61
19.97
V100
72.61
74.18
config
model | log
Method
Backbone
Crop Size
Lr schd
Mem (GB)
Inf time (fps)
Device
mIoU
mIoU(ms+flip)
config
download
PSPNet
R-18-D8
896x896
80000
4.52
26.91
V100
60.22
61.25
config
model | log
PSPNet
R-50-D8
896x896
80000
16.58
8.88
V100
65.36
66.48
config
model | log
Note:
FP16
means Mixed Precision (FP16) is adopted in training.
896x896
is the Crop Size of iSAID dataset, which is followed by the implementation of PointFlow: Flowing Semantics Through Points for Aerial Image Segmentation
rsb
is short for 'Resnet strikes back'.
The b
in R-50b
means ResNetV1b, which is a standard ResNet backbone. In MMSegmentation, default backbone is ResNetV1c, which usually performs better in semantic segmentation task.
@inproceedings {zhao2017pspnet ,
title ={ Pyramid Scene Parsing Network} ,
author ={ Zhao, Hengshuang and Shi, Jianping and Qi, Xiaojuan and Wang, Xiaogang and Jia, Jiaya} ,
booktitle ={ CVPR} ,
year ={ 2017}
}
@article {wightman2021resnet ,
title ={ Resnet strikes back: An improved training procedure in timm} ,
author ={ Wightman, Ross and Touvron, Hugo and J{\'e}gou, Herv{\'e}} ,
journal ={ arXiv preprint arXiv:2110.00476} ,
year ={ 2021}
}