Giter VIP home page Giter VIP logo

dfam's Introduction

A Disparity Feature Alignment Module for Stereo Image Super-Resolution

Abstract

Recently, the performance of super-resolution has been improved by the stereo images since the additional information could be obtained from another view. However, it is a challenge to interact the cross-view information since disparities between left and right images are variable. To address this issue, we propose a disparity feature alignment module (DFAM) to exploit the disparity information for feature alignment and fusion. Specifically, we design a modified atrous spatial pyramid pooling module to estimate disparities and warp stereo features. Then we use spatial and channel attention for feature fusion. In addition, DFAM can be plugged into an arbitrary SISR network to super-resolve a stereo image pair. Extensive experiments demonstrate that DFAM incorporates stereo information with less inference time and memory cost. Moreover, RCAN equipped with DFAMs achieves better performance against state-of-the-art methods. The code can be obtained at https://github.com/JiawangDan/DFAM.

Requirements

  • Python 3
  • PyTorch, torchvision
  • Numpy, Scipy
  • importlib
  • Matlab

Dataset

We also use 800 images and 112 images of Flickr1024 dataset as the training data and the validation data respectively. In addition, we use 5 images from the Middlebury dataset, 20 images from the KITTI 2012 dataset and 20 images from the KITTI 2015 dataset as the test data.

  1. Download Flickr1024 and unzip on dataset directory as below:
data
└── train
    ├── Flickr1024
        ├── 0001_L.png
        ├── 0001_R.png
        ├── 0002_L.png
        ├── ...
    ├── Flickr1024_patches
        ├── patches_x2
        ├── patches_x3
        ├── patches_x4
    ├── generate_trainset.m
    ├── modcrop.m
└── valid
    ├── ...
└── test
    ├── middlebury
        ├── hr
            ├── cloth2
                ├── lr0.png
                ├── lr1.png
            ├── ...
        ├── lr_x2
            ├── ...
        ├── lr_x4
            ├── ...
    ├── KITTI2012
        ├── ...
    ├── KITTI2015
        ├── ...
  1. During the training process, all the training data is cropped into 30×90 patches with a stride of 20. Move 'generate_testset.m', 'generate_trainset.m' and 'modcrop.m' in the above location.
$ cd data/train && python generate_trainset.py
$ cd data/test && python generate_testset.py
  1. Other benchmark datasets can be downloaded in Middlebury, KITTI2012 and KITTI2015. Please put all the datasets in data directory.

Test Pretrained Models

We provide the pretrained models in ckpt(baidu drive) (password:1234) or ckpt(google drive) directory. To test DFAM on benchmark dataset:

$ python test.py --model VDSR_DFAM --scale 4 --dataset middlebury --upsample --rgb2y --checkpoint ckpt/VDSR_DFAM/VDSR_DFAM_x4.pth --device cuda
$ python test.py --model SRCNN_DFAM --scale 4 --dataset middlebury --upsample --rgb2y --checkpoint ckpt/SRCNN_DFAM/SRCNN_DFAM_x4.pth --device cuda
$ python test.py --model SRResNet_DFAM --scale 4 --dataset middlebury --checkpoint ckpt/SRResNet_DFAM/SRResNet_DFAM_x4.pth --device cuda
$ python test.py --model RCAN_DFAM --scale 4 --dataset middlebury --checkpoint ckpt/RCAN_DFAM/RCAN_DFAM_x4.pth --device cuda

Training Models

To augment the data, random horizontal and vertical flipping are adopted.

$ python train.py --model VDSR_DFAM --scale 2 --batchSize 12 --upsample --rgb2y --pretrained ./ckpt/VDSR/pretrain_statedict.pth
$ python train.py --model SRCNN_DFAM --scale 2 --batchSize 16 --upsample --rgb2y --pretrained ./ckpt/SRCNN/pretrain_statedict.pth
$ python train.py --model SRResNet_PDAM --scale 2 --batchSize 32 --pretrained ./ckpt/SRResNet/pretrain_statedictx2.pth
$ python train.py --model RCAN_PDAM --scale 2 --batchSize 8 --pretrained ./ckpt/RCAN/pretrain_statedictx2.pt

Results

image

Citation

@article{dan2021disparity,
  title={A Disparity Feature Alignment Module for Stereo Image Super-Resolution},
  author={Dan, Jiawang and Qu, Zhaowei and Wang, Xiaoru and Gu, Jiahang},
  journal={IEEE Signal Processing Letters},
  year={2021},
  publisher={IEEE}
}

dfam's People

Contributors

jiawangdan avatar

Stargazers

Chengchen Feng avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.