Giter VIP home page Giter VIP logo

lap_rep_kd_depth's Introduction

Lap_Rep_KD_Depth

Decomposition and Replacement: Spatial Knowledge Distillation for Monocular Depth Estimation

This repository is an oiffical Pytorch implementation of the paper "Decomposition and Replacement: Spatial Knowledge Distillation for Monocular Depth Estimation"

Minsoo Song and Wonjun Kim*
Journal of Visual Communication and Image Representation (JVCIR)

example1 The proposed method is to transfer the rich knowledge of the scene depth, which is well encoded through the teacher network, to the student network in a structured way by decomposing it into the global context and local details.

We also propose a new guidance concept for knowledge distillation, so-called ReplaceBlock, which replaces blocks randomly selected in the decoded feature of the student network with those of the teacher network. Our ReplaceBlock gives a smoothing effect in learning the feature distribution of the teacher network by considering the spatial contiguity in the feature space.

Enviornments

  • Python >= 3.7
  • Pytorch >= 1.6.0
  • Ubuntu 16.04
  • CUDA 10.2
  • cuDNN (if CUDA available)

some other packages: geffnet, path, blessings, progressbar, tqdm

Pretrained models

We proivde pre-trained MobileNetV2 and ResNet18 weights for KITTI and NYUV2 datasets. These models are trained on 4x Titan X GPUs. This is a reimplementation and the quantitative results are slightly different from our original paper.

Dataset Preparation

We referred to BTS in the data preparation process.

KITTI

1. Official ground truth

  • Download official KITTI ground truth on the link and make KITTI dataset directory.
    $ cd ./datasets
    $ mkdir KITTI && cd KITTI
    $ mv ~/Downloads/data_depth_annotated.zip ./datasets/KITTI
    $ unzip data_depth_annotated.zip

2. Raw dataset

  • Construct raw KITTI dataset using following commands.
    $ mv ./datasets/kitti_archives_to_download.txt ./datasets/KITTI
    $ cd ./datasets/KITTI
    $ aria2c -x 16 -i ./kitti_archives_to_download.txt
    $ parallel unzip ::: *.zip

KITTI data structures are should be organized as below:

|-- datasets
  |-- KITTI
     |-- data_depth_annotated  
        |-- 2011_xx_xx_drive_xxxx_sync
           |-- proj_depth  
              |-- groundtruth            # official G.T folder
        |-- ... (all drives of all days in the raw KITTI)  
     |-- 2011_09_26                      # raw RGB data folder  
        |-- 2011_09_26_drive_xxxx_sync
     |-- 2011_09_29
     |-- ... (all days in the raw KITTI)

NYU Depth V2

1. Training set
Make NYU dataset directory

    $ cd ./datasets
    $ mkdir NYU_Depth_V2 && cd NYU_Depth_V2
  • Constructing training data using following steps :
    • Download Raw NYU Depth V2 dataset (450GB) from this Link.
    • Extract the raw dataset into './datasets/NYU_Depth_V2'
      (It should make './datasets/NYU_Depth_V2/raw/....').
    • Run './datasets/sync_project_frames_multi_threads.m' to get synchronized data. (need Matlab)
      (It shoud make './datasets/NYU_Depth_V2/sync/....').
  • Or, you can directly download whole 'sync' folder from our Google drive Link into './datasets/NYU_Depth_V2/'

2. Testing set
Download official nyu_depth_v2_labeled.mat and extract image files from the mat file.

    $ cd ./datasets
    ## Download official labled NYU_Depth_V2 mat file
    $ wget http://horatio.cs.nyu.edu/mit/silberman/nyu_depth_v2/nyu_depth_v2_labeled.mat
    ## Extract image files from the mat file
    $ python extract_official_train_test_set_from_mat.py nyu_depth_v2_labeled.mat splits.mat ./NYU_Depth_V2/official_splits/

Evaluation

Make sure you downloaded the pre-trained model and placed it in the './pretrained/' directory before running the evaluation code.

  • Evaluation Command Line:
# Running evaluation using a pre-trained models
## KITTI
python eval.py --batch_size 1 --gpu_num 0 --student --evaluate --dataset KITTI --model MobileNetV2 --data_path      /your/workspace/KITTI/ --model_dir ./MobileNet_encoder_pretrained_KITTI.pkl --decoder_dir ./decoder_pretrained_KITTI.pkl
## NYU
python eval.py --batch_size 1 --gpu_num 0 --student --evaluate  --dataset NYU --model ResNet18 --data_path /your/workspace/NYU_Depth_V2/official_splits/test --model_dir ./ResNet18_encoder_pretrained_NYU.pkl --decoder_dir ./decoder_pretrained_NYU.pkl

Training

Single training (without spatial knowledge distillation)

# 4 gpus setting
# KITTI
OMP_NUM_THREADS=1 python train.py --distributed --batch_size 80 --dataset KITTI --data_path /your/workspace/KITTI --mode Single_training --epochs 50 --model MobileNetV2 --gpu_num 0,1,2,3
# NYU
OMP_NUM_THREADS=1 python train.py --distributed --batch_size 80 --dataset NYU --data_path /your/workspace/NYU_Depth_V2/sync --mode Single_training --epochs 50 --model MobileNetV2 --gpu_num 0,1,2,3

Student training (with spatial knowledge distillation)

  • We provide pre-trained teacher network weights (ResNext101)
# 4 gpus setting
# KITTI
OMP_NUM_THREADS=1 python train.py --distributed --batch_size 80 --dataset KITTI --data_path /your/workspace/KITTI --model_encoder_dir ./ResNext_single_encoder_pretrained_KITTI.pkl --model_decoder_dir ./ResNext_single_decoder_pretrained_KITTI.pkl --mode Student_training --epochs 50 --T_model ResNext101 --model MobileNetV2 --gpu_num 0,1,2,3
# NYU
OMP_NUM_THREADS=1 python train.py --distributed --batch_size 80 --dataset NYU --data_path /your/workspace/NYU_Depth_V2/sync --model_encoder_dir ./ResNext_single_encoder_pretrained_NYU.pkl --model_decoder_dir ./ResNext_single_decoder_pretrained_NYU.pkl --mode Student_training --epochs 50 --T_model ResNext101 --model MobileNetV2 --gpu_num 0,1,2,3

if you don't want distributed training, remove --distributed argument.

'--gpu_num' argument is an index list of your available GPUs you want to use (e.g., 0,1,2,3).
ex) If you want to activate only the 3rd gpu out of 4 gpus, insert '--gpu_num 2'

Qualitative results of the proposed distillation scheme shown in KITTI (top) and NYU Depth V2 (bottom) datasets.

example3 Upper left: input color image, upper right: ground truth, bottom left: the proposed method, bottom right: w/o distillation. Note that the ground truth samples are interpolated for better visualization. example4 Upper left: input color image, upper right: ground truth, bottom left: the proposed method, bottom right: w/o distillation. Note that the ground truth samples are interpolated for better visualization.

Reference

When using this code in your research, please cite the following paper:

M. Song and W. Kim, "Decomposition and replacement: Spatial knowledge distillation for monocular depth estimation," in Journal of Visual Communication and Image Representation, vol. 85, pp. 103523, May. 2022.

@ARTICLE{SONG2022103523,
  author={M. {Song} and W. {Kim}},
  journal={Journal of Visual Communication and Image Representation}, 
  title={Decomposition and replacement: Spatial knowledge distillation for monocular depth estimation},
  month={May},
  year={2022},
  volume={85},
  pages = {103523},
  issn = {1047-3203},
  doi={10.1016/j.jvcir.2022.10352}}

lap_rep_kd_depth's People

Contributors

tjqansthd avatar

Stargazers

Richard Meng avatar  avatar Yongjia Ma avatar Cheonhee Yun avatar huynhthaihoa avatar Jeonghwan Kim avatar Jiarun Liu avatar Cheng Feng avatar  avatar Minji Kim avatar MiGyeong Gwon avatar

Watchers

 avatar

Forkers

deltax-ai-lab

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.