Giter VIP home page Giter VIP logo

fmnet's Introduction

Less is More: Consistent Video Depth Estimation with Masked Frames Modeling (ACM MM 2022)

Yiran Wang1, Zhiyu Pan1, Xingyi Li1, Zhiguo Cao1, Ke Xian1*, Jianming Zhang2

1Huazhong University of Science and Technology, 2Adobe Research

The official project of ACM MM 2022 paper

"Less is More: Consistent Video Depth Estimation with Masked Frames Modeling".

Abstract

Temporal consistency is the key challenge of video depth estimation. Previous works are based on additional optical flow or camera poses, which is time-consuming. By contrast, we derive consistency with less information. Since videos inherently exist with heavy temporal redundancy, a missing frame could be recovered from neighboring ones. Inspired by this, we propose the frame masking network (FMNet), a spatial-temporal transformer network predicting the depth of masked frames based on their neighboring frames. By reconstructing masked temporal features, the FMNet can learn intrinsic inter-frame correlations, which leads to consistency. Compared with prior arts, experimental results demonstrate that our approach achieves comparable spatial accuracy and higher temporal consistency without any additional information. Our work provides a new perspective on consistent video depth estimation.

image

Installation

Our code is based on python=3.6.13 and pytorch==1.7.1.

You can refer to the environment.yml or requirements.txt for installation.

Some libraries in those files are not needed for the code.

conda create -n fmnet python=3.6
conda activate fmnet
conda install pytorch==1.7.1 torchvision==0.8.2 cudatoolkit=11.0 -c pytorch -c conda-forge
pip install numpy imageio opencv-python scipy tensorboard timm scikit-image tqdm glob h5py

Demo

Donwload our checkpoint on the NYUDV2 dataset and put it in the checkpoint folder.

The RGB frames are placed in ./demo/rgb. The visualization results will be saved in ./demo/results folder.

python demo.py

Evaluation

Donwload the 654 testing sequences of the NYUDV2 dataset and put it in the ./data/testnyu_data/ folder.

Each sequence contains 12 consecutive RGB frames and the ground truth of the 654 testing images for evaluations.

python testfmnet_nyu.py

Citation

If you find our work useful in your research, please consider to cite our paper.

@inproceedings{Wang2022fmnet,
  title = {Less is More: Consistent Video Depth Estimation with Masked Frames Modeling},
  author = {Yiran, Wang and Zhiyu, Pan and Xingyi, Li and Zhiguo, Cao and Ke, Xian and Jianming, Zhang},
  booktitle = {Proceedings of the 30th ACM International Conference on Multimedia (MM '22)},
  year = {2022}
  publisher = {Association for Computing Machinery},
  address = {New York, NY, USA},
}

fmnet's People

Contributors

raymondwang987 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.