Lap_Rep_KD_Depth

Decomposition and Replacement: Spatial Knowledge Distillation for Monocular Depth Estimation

This repository is an oiffical Pytorch implementation of the paper "Decomposition and Replacement: Spatial Knowledge Distillation for Monocular Depth Estimation"

Minsoo Song and Wonjun Kim*
Journal of Visual Communication and Image Representation (JVCIR)

The proposed method is to transfer the rich knowledge of the scene depth, which is well encoded through the teacher network, to the student network in a structured way by decomposing it into the global context and local details.

We also propose a new guidance concept for knowledge distillation, so-called ReplaceBlock, which replaces blocks randomly selected in the decoded feature of the student network with those of the teacher network. Our ReplaceBlock gives a smoothing effect in learning the feature distribution of the teacher network by considering the spatial contiguity in the feature space.

Enviornments

Python >= 3.7
Pytorch >= 1.6.0
Ubuntu 16.04
CUDA 10.2
cuDNN (if CUDA available)

some other packages: geffnet, path, blessings, progressbar, tqdm

Pretrained models

We proivde pre-trained MobileNetV2 and ResNet18 weights for KITTI and NYUV2 datasets. These models are trained on 4x Titan X GPUs. This is a reimplementation and the quantitative results are slightly different from our original paper.

MobileNetV2_encoder_pretrained_with_kitti

Decoder_pretrained_with_kitti

Method	speed	a1	a2	a3	Abs Rel	Sq Rel	log10	RMSE	RMSE log
Teacher	47 fps	0.941	0.990	0.998	0.072	0.284	0.030	2.760	0.109
baseline	110 fps	0.908	0.979	0.993	0.088	0.396	0.038	3.294	0.135
with Spatial-KD	110 fps	0.928	0.987	0.997	0.077	0.339	0.033	3.084	0.118

ResNet18_encoder_pretrained_with_nyu

Decoder_pretrained_with_nyu

Method	speed	a1	a2	a3	Abs Rel	Sq Rel	log10	RMSE	RMSE log
Teacher	47 fps	0.862	0.978	0.994	0.120	0.078	0.050	0.431	0.151
baseline	166 fps	0.785	0.945	0.983	0.165	0.131	0.068	0.533	0.193
with Spatial-KD	166 fps	0.806	0.958	0.988	0.155	0.117	0.063	0.500	0.185

Dataset Preparation

We referred to BTS in the data preparation process.

KITTI

1. Official ground truth

Download official KITTI ground truth on the link and make KITTI dataset directory.

    $ cd ./datasets
    $ mkdir KITTI && cd KITTI
    $ mv ~/Downloads/data_depth_annotated.zip ./datasets/KITTI
    $ unzip data_depth_annotated.zip

2. Raw dataset

Construct raw KITTI dataset using following commands.

    $ mv ./datasets/kitti_archives_to_download.txt ./datasets/KITTI
    $ cd ./datasets/KITTI
    $ aria2c -x 16 -i ./kitti_archives_to_download.txt
    $ parallel unzip ::: *.zip

KITTI data structures are should be organized as below:

|-- datasets
  |-- KITTI
     |-- data_depth_annotated  
        |-- 2011_xx_xx_drive_xxxx_sync
           |-- proj_depth  
              |-- groundtruth            # official G.T folder
        |-- ... (all drives of all days in the raw KITTI)  
     |-- 2011_09_26                      # raw RGB data folder  
        |-- 2011_09_26_drive_xxxx_sync
     |-- 2011_09_29
     |-- ... (all days in the raw KITTI)

NYU Depth V2

1. Training set
Make NYU dataset directory

    $ cd ./datasets
    $ mkdir NYU_Depth_V2 && cd NYU_Depth_V2

Constructing training data using following steps :
- Download Raw NYU Depth V2 dataset (450GB) from this Link.
- Extract the raw dataset into './datasets/NYU_Depth_V2'
  (It should make './datasets/NYU_Depth_V2/raw/....').
- Run './datasets/sync_project_frames_multi_threads.m' to get synchronized data. (need Matlab)
  (It shoud make './datasets/NYU_Depth_V2/sync/....').
Or, you can directly download whole 'sync' folder from our Google drive Link into './datasets/NYU_Depth_V2/'

2. Testing set
Download official nyu_depth_v2_labeled.mat and extract image files from the mat file.

    $ cd ./datasets
    ## Download official labled NYU_Depth_V2 mat file
    $ wget http://horatio.cs.nyu.edu/mit/silberman/nyu_depth_v2/nyu_depth_v2_labeled.mat
    ## Extract image files from the mat file
    $ python extract_official_train_test_set_from_mat.py nyu_depth_v2_labeled.mat splits.mat ./NYU_Depth_V2/official_splits/

Evaluation

Make sure you downloaded the pre-trained model and placed it in the './pretrained/' directory before running the evaluation code.

Evaluation Command Line:

# Running evaluation using a pre-trained models
## KITTI
python eval.py --batch_size 1 --gpu_num 0 --student --evaluate --dataset KITTI --model MobileNetV2 --data_path      /your/workspace/KITTI/ --model_dir ./MobileNet_encoder_pretrained_KITTI.pkl --decoder_dir ./decoder_pretrained_KITTI.pkl
## NYU
python eval.py --batch_size 1 --gpu_num 0 --student --evaluate  --dataset NYU --model ResNet18 --data_path /your/workspace/NYU_Depth_V2/official_splits/test --model_dir ./ResNet18_encoder_pretrained_NYU.pkl --decoder_dir ./decoder_pretrained_NYU.pkl

Training

Single training (without spatial knowledge distillation)

# 4 gpus setting
# KITTI
OMP_NUM_THREADS=1 python train.py --distributed --batch_size 80 --dataset KITTI --data_path /your/workspace/KITTI --mode Single_training --epochs 50 --model MobileNetV2 --gpu_num 0,1,2,3
# NYU
OMP_NUM_THREADS=1 python train.py --distributed --batch_size 80 --dataset NYU --data_path /your/workspace/NYU_Depth_V2/sync --mode Single_training --epochs 50 --model MobileNetV2 --gpu_num 0,1,2,3

Student training (with spatial knowledge distillation)

We provide pre-trained teacher network weights (ResNext101)

# 4 gpus setting
# KITTI
OMP_NUM_THREADS=1 python train.py --distributed --batch_size 80 --dataset KITTI --data_path /your/workspace/KITTI --model_encoder_dir ./ResNext_single_encoder_pretrained_KITTI.pkl --model_decoder_dir ./ResNext_single_decoder_pretrained_KITTI.pkl --mode Student_training --epochs 50 --T_model ResNext101 --model MobileNetV2 --gpu_num 0,1,2,3
# NYU
OMP_NUM_THREADS=1 python train.py --distributed --batch_size 80 --dataset NYU --data_path /your/workspace/NYU_Depth_V2/sync --model_encoder_dir ./ResNext_single_encoder_pretrained_NYU.pkl --model_decoder_dir ./ResNext_single_decoder_pretrained_NYU.pkl --mode Student_training --epochs 50 --T_model ResNext101 --model MobileNetV2 --gpu_num 0,1,2,3

if you don't want distributed training, remove --distributed argument.

'--gpu_num' argument is an index list of your available GPUs you want to use (e.g., 0,1,2,3).
ex) If you want to activate only the 3rd gpu out of 4 gpus, insert '--gpu_num 2'

Qualitative results of the proposed distillation scheme shown in KITTI (top) and NYU Depth V2 (bottom) datasets.

Upper left: input color image, upper right: ground truth, bottom left: the proposed method, bottom right: w/o distillation. Note that the ground truth samples are interpolated for better visualization. Upper left: input color image, upper right: ground truth, bottom left: the proposed method, bottom right: w/o distillation. Note that the ground truth samples are interpolated for better visualization.

Reference

When using this code in your research, please cite the following paper:

M. Song and W. Kim, "Decomposition and replacement: Spatial knowledge distillation for monocular depth estimation," in Journal of Visual Communication and Image Representation, vol. 85, pp. 103523, May. 2022.

@ARTICLE{SONG2022103523,
  author={M. {Song} and W. {Kim}},
  journal={Journal of Visual Communication and Image Representation}, 
  title={Decomposition and replacement: Spatial knowledge distillation for monocular depth estimation},
  month={May},
  year={2022},
  volume={85},
  pages = {103523},
  issn = {1047-3203},
  doi={10.1016/j.jvcir.2022.10352}}

tjqansthd / lap_rep_kd_depth Goto Github PK

lap_rep_kd_depth's Introduction

Lap_Rep_KD_Depth

Enviornments

Pretrained models

Dataset Preparation

KITTI

NYU Depth V2

Evaluation

Training

Qualitative results of the proposed distillation scheme shown in KITTI (top) and NYU Depth V2 (bottom) datasets.

Reference

lap_rep_kd_depth's People

Contributors

Stargazers

Watchers

Forkers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent