Giter VIP home page Giter VIP logo

tkcn's Introduction

Tree-structured Kronecker Convolutional Network for Semantic Segmentation

Introduction

Most existing semantic segmentation methods employ atrous convolution to enlarge the receptive field of filters, but neglect important local contextual information. To tackle this issue, we firstly propose a novel Kronecker convolution which adopts Kronecker product to expand its kernel for taking into account the feature vectors neglected by atrous convolutions. Therefore, it can capture local contextual information and enlarge the field of view of filters simultaneously without introducing extra parameters. Secondly, we propose Tree-structured Feature Aggregation (TFA) module which follows a recursive rule to expand and forms a hierarchical structure. Thus, it can naturally learn representations of multi-scale objects and encode hierarchical contextual information in complex scenes. Finally, we design Tree-structured Kronecker Convolutional Network (TKCN) that employs Kronecker convolution and TFA module. Extensive experiments on three datasets, PASCAL VOC 2012, PASCAL-Context and Cityscapes, verify the effectiveness of our proposed approach.

Approach



Performance

For VOC 2012, we evaluate the proposed TKCN model on test set without external data such as COCO dataset.

For Cityscapes, the proposed TKCN only trains with the fine-labeled set.

Method Conference Backbone PASCAL VOC 2012
test set
Cityscapes
test set
PASCAL-Context
val set
DeepLabv2 - ResNet-101 79.7 70.4 45.7
RefineNet CVPR2017 ResNet-101 82.4 73.6 47.1
SAC ICCV2017 ResNet-101 - 78.1 -
PSPNet CVPR2017 ResNet-101 82.6 78.4 47.8
DUC-HDC WACV2018 ResNet-101 - 77.6 -
AAF ECCV2018 ResNet-101 82.2 79.1 -
BiSeNet ECCV2018 ResNet-101 - 78.9 -
PSANet ECCV2018 ResNet-101 - 80.1 -
DeepLabv3+ ECCV2018 Xception 89.0 - -
DFN CVPR2018 ResNet-101 82.7 79.3 -
DSSPN CVPR2018 ResNet-101 - 77.8 -
CCL CVPR2018 ResNet-101 - - 51.6
EncNet CVPR2018 ResNet-101 82.9 - 51.7
DenseASPP CVPR2018 DenseNet - 80.6 -
TKCN - ResNet-101 83.2 79.5 51.8

Note that: DeepLabv3+ employs a more powerful network (Xception) as the backbone and is pretrained on MS-COCO and JFT. "-" indicates that the approaches do not report the corresponding results. DenseASPP employs a more powerful backbone network (DenseNet).

Installation

  1. Install PyTorch
  • The code is developed on python3.6.6 on Ubuntu 16.04. (GPU: Tesla K80; PyTorch: 0.5.0a0+a24163a; Cuda: 8.0)
  1. Clone the repository
    git clone https://github.com/wutianyiRosun/TKCN.git 
    cd TKCN
    python setup.py install
  2. Pretrained model The pretrained model ImageNet_ResNet-101 can be available at here. Put it under the folder "./TKCN/tkcn/pretrained_models".
  3. Dataset Configuration
├── cityscapes_test_list.txt
├── cityscapes_train_list.txt
├── cityscapes_trainval_list.txt
├── cityscapes_val_list.txt
├── cityscapes_val.txt
├── gtCoarse
│   ├── train
│   ├── train_extra
│   └── val
├── gtFine
│   ├── test
│   ├── train
│   └── val
├── leftImg8bit
│   ├── test
│   ├── train
│   └── val
├── license.txt

  • These .txt files can be downloaded from here

Train your own model

For Cityscapes

  1. training on train+val set
cd tkcn
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5 python train.py  --model tkcnet --backbone resnet101 
  1. single-scale testing (on test set)
  CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python eval.py  --model tkcnet --backbone resnet101  --resume-dir cityscapes/model/tkcnet_model_resnet101_cityscapes_gpu6bs6epochs240/TKCNet101 --resume-file checkpoint_240.pth.tar
  1. multi-scale testing (on test set)
  CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python eval.py  --model tkcnet --backbone resnet101  --multi-scales  --resume-dir cityscapes/model/tkcnet_model_resnet101_cityscapes_gpu6bs6epochs240/TKCNet101 --resume-file checkpoint_240.pth.tar

Citation

If TKCN is useful for your research, please consider citing:

@article{wu2018tree,
  title={Tree-structured Kronecker Convolutional Networks for Semantic Segmentation},
  author={Wu, Tianyi and Tang, Sheng and Zhang, Rui and Li, Jintao},
  journal={arXiv preprint arXiv:1812.04945},
  year={2018}
} 

License

This code is released under the MIT License. See LICENSE for additional details.

Thanks to the Third Party Libs

https://github.com/zhanghang1989/PyTorch-Encoding

https://github.com/junfu1115/DANet

Note

The original code for TKCN is based on CAFFE, which will be released later. This is an implementation of TKCN in PyTorch.

tkcn's People

Contributors

wutianyirosun avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.