A image semantic segmentation toolbox (single GPU) contains some common semantic segmentation algorithms. The code is implemented by Pytorch.
- pytorch >= 1.0.0
- python 3.x
python train.py --option1 value1 --option2 value2 ...
For the list of options, please see train.py
Algorithms | backbone | norm | dataset | batch_size | image_size | Epoch | pixAcc | mIoU |
---|---|---|---|---|---|---|---|---|
PSPNet [1] | resnet50 | bn | ade20k | 16 | 473 | 120 | 80.04 | 41.68 |
PSPNet | resnet50 | bn | ade20k | 12 | 384 | 30 | 77.1 | 38.6 |
PSPNet | resnet50 | bn | ade20k+bk | 12 | 384 | 30 | 72.19 | 35.3 |
EncNet [2] | resnet50 | bn | ade20k | 16 | 480 | 120 | 79.73 | 41.11 |
EncNet | resnet50 | bn | ade20k | 8 | 400 | 50 | 77.7 | 40.3 |
DeeplabV3 [3] | xception | bn | ade20k | 8 | 384 | 50 | 77.6 | 39.5 |
DeeplabV3+ [4] | xception | bn | ade20k | 8 | 384 | 50 | 77.9 | 39.8 |
FCN32s [5] | vgg19bn | bn | ade20k | 12 | 384 | 50 | 73.0 | 31.2 |
The items with hyperlinks are the experimental results from the original paper
In the original paper, authors run their experiments on the standard ADE20k(150 classes, without background). But I regard the background (i.e. labeled 0 in the original mask) as a category and the output dimensionality of the PSPNet is 151 in my code. Therefore, the performance gap mainly comes from three aspects:
- I add the background class to the dataset, which may lead to category imbalance problems and increases the complexity of the model.
- Due to limited video memory on a single GPU, I set the batch_size to 12/8 and image_size to 384/400 instead of the parameter settings in the original paper.
- In addition, the experiments in the original paper used multiple GPUs, which means a larger batch_size can be set to make Synchronization Batch Normalization layers more effective.
- PSPNet
- ENCNet
- ENCNet+JPU
- Deeplabv3
- Deeplabv3+
- RefineNet
- FPN
- LinkNet
- SegNet
- FCN
- Unet
- Unet++
- DenseASPP
- ICNet
- BiSeNet
- PSANet
- DANet
- OCNet
- CCNet
- ENet
- DUNet
[3] Chen, Liang Chieh , et al. "Rethinking Atrous Convolution for Semantic Image Segmentation." (2017).