- Link to project page
- Link to my homepage
- This is repository for my paper ShelfNet for real-time semantic segmentation, and achieves both faster inference speed and higher segmentation accuracy, compared with other real-time models such as Lightweight-RefineNet.
- This branch performs experiments on Cityscapes dataset, please see branch
master
for experiments on PASCAL VOC dataset. - This implementation is based on torch-encoding. Main difference is the structure of the model.
Results
- We tested ShelfNet with ResNet50 and ResNet101 as the backbone respectively: they achieved 59 FPS and 42 FPS respectively on a GTX 1080Ti GPU with a 512x512 input image.
- On PASCAL VOC 2012 test set, it achieved 84.2% mIoU with ResNet101 backbone and 82.8% mIoU with ResNet50 backbone.
- It achieved 75.8% mIoU with ResNet50 backbone on Cityscapes dataset.
Differences from results reported in the paper
- The result of ShelfNet is slightly different on this implementation and reported in the paper (75.4% in this implementation, 75.8% in the paper).
- The paper trains 500 epochs, while here the training epoch is 240.
- The paper does not use synchronized batch normalization, while this implementation uses synchronized batch normalization across multiple GPUs.
- For training on coarse labelled data, in this implementation the learning rate is set as 0.01 and remains constant; in results for the paper, the training on coarse labelled data uses a poly decay schedule, but the total epochs is set as 500, while I stopped the training mannualy at epoch 35 (In this way, there is a very slight decay on learning rate instead of constant).
Please cite our paper
@article{zhuang2018shelfnet,
title={ShelfNet for Real-time Semantic Segmentation},
author={Zhuang, Juntang and Yang, Junlin},
journal={arXiv preprint arXiv:1811.11254},
year={2018}
}
- Please refer to torch-encoding for implementation on synchronized batch-normalization layer.
- PyTorch 0.4.1
- Python 3.6
- requests
- nose
- scipy
- tqdm
- Other requirements by torch-encoding.
Environment setup
- run
python setup.py install
to install torch-encoding - make sure you have the same path for a datset in
/scripts/prepare_xx.py
and/encoding/datasets/xxx.py
, default path is~/.encoding/data
, which is a hidden folder. You will need to typeCtrl + h
to show is inFiles
PASCAL dataset preparation
- run
cd scripts
- run
python prepared_xx.py
to prepare datasets, including MS COCO, PASCAL VOC, PASCAL Aug, PASCAL Context - Download test dataset from official evaluation server for PASCAL, extract and merge with training data folder, e.g.
~/.encoding/data/VOCdevkit
Cityscapes dataset preparation
- The data preparation code is modified from fyu implementation
- The scripts are in folder scripts/prepare_citys
- Step 1, download Cityscapes and Cityscapes Coarse dataset from Cityscapes official website, you need to download
gtFine_trainvaltest.zip, gtFine_trainvaltest.zip , leftImg8bit_trainvaltest.zip, leftImg8bit_trainvaltest.zip
and unzip them into one folder - Step 2, prepare fine labelled dataset:
- convert original segmentation id into 19 training ids
python3 scripts/prepare_citys/prepare_data.py <cityscape folder>/gtFine/
- Run
sh create_lists_citys.sh
in cityscape data folder, and moveinfo.json
into the data folder
- convert original segmentation id into 19 training ids
- Step 3, prepare coarse labelled dataset:
- convert original segmentation id into 19 training ids
python3 scripts/prepare_citys/prepare_data.py <cityscape folder>/gtCoarse/
- Run
sh create_lists_citys_coarse.sh
in cityscape data folder, and moveinfo.json
into the data folder
- convert original segmentation id into 19 training ids
Configurations (refer to /experiments/option.py)
- --diflr: default value is True. If set as True, the head uses 10x larger learning rate than the backbone; otherwise head and backbone uses the same learning rate.
- --model: which model to use, default is
shelfnet
, other options includepspnet
,encnet
,fcn
- --backbone: backbone of the model,
resnet50
orresnet101
- --dataset: which dataset to train on,
coco
for MS COCO,pascal_aug
for augmented PASCAL,pascal_voc
for PASCAL VOC,pcontext
for pascal context. - --aux: if type
--aux
, the model will use auxilliray layer, which is a FCN head based on the final block of backbone. - --se_loss: a context module based on final block of backbone, the shape is 1xm where m is number of categories. It penalizes whether a category is present or not.
- --resume: default is None. It specifies the checkpoint to load
- --ft: fine tune flag. If set as True, the code will resume from checkpoint but forget optimizer information.
- --checkname: folder name to store trained weights
- Other parameters are trevial, please refer to /experiments/segmentation/option.py for more details
Training scripts on PASCAL VOC
- run
cd /experiments/segmentation
- pre-train ShelfNet50 on COCO,
python train.py --backbone resnet50 --dataset coco --aux --se-loss --checkname ShelfNet50_aux
- fine-tune ShelfNet50 on PASCAL_aug, you may need to double check the path for resume.
python train.py --backbone resnet50 --dataset pascal_aug --aux --se-loss --checkname ShelfNet50_aux --resume ./runs/coco/shelfnet/ShelfNet50_aux_se/model_best.pth.tar -ft
- fine-tune ShelfNet50 on PASCAL VOC, you may need to double check the path for resume.
python train.py --backbone resnet50 --dataset pascal_voc --aux --se-loss --checkname ShelfNet50_aux --resume ./runs/pascal_aug/shelfnet/ShelfNet50_aux_se/model_best.pth.tar -ft
Training scripts on Cityscapes
- run
cd /experiments/segmentation
- pre-train ShelfNet50 on coarse labelled dataset,
python train.py --diflr False --backbone resnet50 --dataset citys_coarse --checkname ShelfNet50_citys_coarse --lr-schedule step
- fine-tune ShelfNet50 on fine labelled dataset, you may need to double check the path for resume.
python train.py --diflr False --backbone resnet50 --dataset citys --checkname citys_coarse --resume ./runs/citys_coarse/shelfnet/ShelfNet50_citys_coarse/model_best.pth.tar -ft
Test scripts on PASCAL VOC
- To test on PASCAL_VOC with multiple-scales input [0.5, 0.75, 1.0, 1.25, 1.5, 1.75].
python test.py --backbone resnet50 --dataset pascal_voc --resume ./runs/pascal_voc/shelfnet/ShelfNet50_aux_se/model_best.pth.tar
- To test on PASCAL_VOC with single-scale input
python test_single_scale.py --backbone resnet50 --dataset pascal_voc --resume ./runs/pascal_voc/shelfnet/ShelfNet50_aux_se/model_best.pth.tar
- Similar experiments can be performed on ShelfNet with ResNet101 backbone, and experiments on Cityscapes can be performed by changing dataset as
--dataset citys
Evaluation scripts
- You can use the following script to generate ground truth - prediction pairs on PASCAL VOC validation set.
python evaluate_and_save.py --backbone resnet50 --dataset pascal_voc --resume ./runs/pascal_voc/shelfnet/ShelfNet50_aux_se/model_best.pth.tar --eval
Measure running speed
- Measure running speed of ShelfNet on 512x512 image.
python test_speed.py --model shelfnet --backbone resnet101
python test_speed.py --model pspnet --backbone resnet101
Pre-trained weights