mukosame / zooming-slow-mo-cvpr-2020 Goto Github PK

Fast and Accurate One-Stage Space-Time Video Super-Resolution (accepted in CVPR 2020)

License: GNU General Public License v3.0

Python 66.60% MATLAB 0.64% Shell 0.18% C++ 6.49% Cuda 23.68% C 2.41%

pytorch video super-resolution video-frame-interpolation video-super-resolution spatio-temporal cvpr2020 cvpr

zooming-slow-mo-cvpr-2020's Introduction

Zooming-Slow-Mo (CVPR-2020)

By Xiaoyu Xiang^*, Yapeng Tian^*, Yulun Zhang, Yun Fu, Jan P. Allebach⁺, Chenliang Xu⁺ (^* equal contributions, ⁺ equal advising)

This is the official Pytorch implementation of Zooming Slow-Mo: Fast and Accurate One-Stage Space-Time Video Super-Resolution.

Paper | Journal Version | Demo (YouTube) | 1-min teaser (YouTube) | 1-min teaser (Bilibili)

Input	Output

Updates

2020.3.13 Add meta-info of datasets used in this paper
2020.3.11 Add new function: video converter
2020.3.10: Upload the complete code and pretrained models

Introduction
Prerequisites
Get Started
Citations
Contact
License
Acknowledgments

Introduction

The repository contains the entire project (including all the preprocessing) for one-stage space-time video super-resolution with Zooming Slow-Mo.

Zooming Slow-Mo is a recently proposed joint video frame interpolation (VFI) and video super-resolution (VSR) method, which directly synthesizes an HR slow-motion video from an LFR, LR video. It is going to be published in CVPR 2020. The most up-to-date paper with supplementary materials can be found at arXiv.

In Zooming Slow-Mo, we firstly temporally interpolate features of the missing LR frame by the proposed feature temporal interpolation network. Then, we propose a deformable ConvLSTM to align and aggregate temporal information simultaneously. Finally, a deep reconstruction network is adopted to predict HR slow-motion video frames. If our proposed architectures also help your research, please consider citing our paper.

Zooming Slow-Mo achieves state-of-the-art performance by PSNR and SSIM in Vid4, Vimeo test sets.

Prerequisites

Python 3 (Recommend to use Anaconda)
PyTorch >= 1.1
NVIDIA GPU + CUDA
Deformable Convolution v2, we adopt CharlesShang's implementation in the submodule.
Python packages: pip install numpy opencv-python lmdb pyyaml pickle5 matplotlib seaborn

Get Started

Installation

Install the required packages: pip install -r requirements.txt

First, make sure your machine has a GPU, which is required for the DCNv2 module.

Clone the Zooming Slow-Mo repository. We'll call the directory that you cloned Zooming Slow-Mo as ZOOMING_ROOT.

git clone --recursive https://github.com/Mukosame/Zooming-Slow-Mo-CVPR-2020.git

Compile the DCNv2:

cd $ZOOMING_ROOT/codes/models/modules/DCNv2
bash make.sh         # build
python test.py    # run examples and gradient check

Please make sure the test script finishes successfully without any errors before running the following experiments.

Training

Part 1: Data Preparation

Download the original training + test set of Vimeo-septuplet (82 GB).

wget http://data.csail.mit.edu/tofu/dataset/vimeo_septuplet.zip
apt-get install unzip
unzip vimeo_septuplet.zip

Split the Vimeo-septuplet into a training set and a test set, make sure you change the dataset's path to your download path in script, also you need to run for the training set and test set separately:

cd $ZOOMING_ROOT/codes/data_scripts/sep_vimeo_list.py

This will create train and test folders in the directory of vimeo_septuplet/sequences. The folder structure is as follows:

vimeo_septuplet
├── sequences
    ├── 00001
        ├── 0266
            ├── im1.png
            ├── ...
            ├── im7.png
        ├── 0268...
    ├── 00002...
├── readme.txt
├──sep_trainlist.txt
├── sep_testlist.txt

Generate low resolution (LR) images. You can either do this via MATLAB or Python (remember to configure the input and output path):

# In Matlab Command Window
run $ZOOMING_ROOT/codes/data_scripts/generate_LR_Vimeo90K.m

python $ZOOMING_ROOT/codes/data_scripts/generate_mod_LR_bic.py

Create the LMDB files for faster I/O speed. Note that you need to configure your input and output path in the following script:

python $ZOOMING_ROOT/codes/data_scripts/create_lmdb_mp.py

The structure of generated lmdb folder is as follows:

Vimeo7_train.lmdb
├── data.mdb
├── lock.mdb
├── meta_info.txt

Part 2: Train

Note: In this part, we assume you are in the directory $ZOOMING_ROOT/codes/

Configure your training settings that can be found at options/train. Our training settings in the paper can be found at train_zsm.yml. We'll take this setting as an example to illustrate the following steps.
Train the Zooming Slow-Mo model.

python train.py -opt options/train/train_zsm.yml

After training, your model xxxx_G.pth and its training states, and a corresponding log file train_LunaTokis_scratch_b16p32f5b40n7l1_600k_Vimeo_xxxx.log are placed in the directory of $ZOOMING_ROOT/experiments/LunaTokis_scratch_b16p32f5b40n7l1_600k_Vimeo/.

Testing

We provide the test code for both standard test sets (Vid4, SPMC, etc.) and custom video frames.

Pretrained Models

Our pretrained model can be downloaded via GitHub or Google Drive.

From Video

If you have installed ffmpeg, you can convert any video to a high-resolution and high frame-rate video using video_to_zsm.py. The corresponding commands are:

cd $ZOOMING_ROOT/codes
python video_to_zsm.py --video PATH/TO/VIDEO.mp4 --model PATH/TO/PRETRAINED/MODEL.pth --output PATH/TO/OUTPUT.mp4

We also write the above commands to a Shell script, so you can directly run:

bash zsm_my_video.sh

From Extracted Frames

As a quick start, we also provide some example images in the test_example folder. You can test the model with the following commands:

cd $ZOOMING_ROOT/codes
python test.py

You can put your own test folders in the test_example too, or just change the input path, the number of frames, etc. in test.py.
Your custom test results will be saved to a folder here: $ZOOMING_ROOT/results/your_data_name/.

Evaluate on Standard Test Sets

The test.py script also provides modes for evaluation on the following test sets: Vid4, SPMC, etc. We evaluate PSNR and SSIM on the Y-channels in YCrCb color space. The commands are the same with the ones above. All you need to do is the change the data_mode and corresponding path of the standard test set.

Colab Notebook

PyTorch Colab notebook (provided by @HanClinto): HighResSlowMo.ipynb

Citations

If you find the code helpful in your resarch or work, please cite the following papers.

@misc{xiang2021zooming,
  title={Zooming SlowMo: An Efficient One-Stage Framework for Space-Time Video Super-Resolution},
  author={Xiang, Xiaoyu and Tian, Yapeng and Zhang, Yulun and Fu, Yun and Allebach, Jan P and Xu, Chenliang},
  archivePrefix={arXiv},
  eprint={2104.07473},
  year={2021},
  primaryClass={cs.CV}
}

@InProceedings{xiang2020zooming,
  author = {Xiang, Xiaoyu and Tian, Yapeng and Zhang, Yulun and Fu, Yun and Allebach, Jan P. and Xu, Chenliang},
  title = {Zooming Slow-Mo: Fast and Accurate One-Stage Space-Time Video Super-Resolution},
  booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  pages={3370--3379},
  month = {June},
  year = {2020}
}

@InProceedings{tian2018tdan,
  author={Yapeng Tian, Yulun Zhang, Yun Fu, and Chenliang Xu},
  title={TDAN: Temporally Deformable Alignment Network for Video Super-Resolution},
  booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  pages={3360--3369},
  month = {June},
  year = {2020}
}

@InProceedings{wang2019edvr,
  author    = {Wang, Xintao and Chan, Kelvin C.K. and Yu, Ke and Dong, Chao and Loy, Chen Change},
  title     = {EDVR: Video restoration with enhanced deformable convolutional networks},
  booktitle = {The IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)},
  month     = {June},
  year      = {2019},
}

Contact

Xiaoyu Xiang and Yapeng Tian.

You can also leave your questions as issues in the repository. We will be glad to answer them.

License

This project is released under the GNU General Public License v3.0.

Acknowledgments

Our code is inspired by TDAN-VSR and EDVR.

zooming-slow-mo-cvpr-2020's People

Contributors

Stargazers

Watchers

Forkers

ml-lab avatarworld hityzy1122 shuyongzhang ywu40 bruinxiong yangsenwxy mengxiangyudlut yanmenglu qlawliet awoziji jmspiewak yangzhen0000 felixzhang7 zhwzhong atsea0107 tsingzao bonbert81 tamwaiban dream-winter mrku69 oj9040 ugly1 lsheiba s-p-z chisyliu peterzs haibao637 seth-cohen coolboy5298 scape1989 jensda wujinlonglovezhangmiao1314 infinitehj odiofan yangtong1989 chuckwoody zhouhuanxiang leng123ku styler00dollar cv-ip laeyoung xrosliang lianyi saket-m vlee-harmonicinc niu2ben delldu matatratata shaohuilin psi1104 h4k1m13or evgenytsizin ngchc sanzangdashizsx fanjunzhi mpriessner seominlee psnow c00renut cliff-bot 8secz-johndpope ruabbit lz118 aakgun orlgln ycqian yinbow bharadwajpro jiatianwu dendisuhubdy reedlove chuanli11 davidleeplaysmart ofsoundof mohamedgaballah1997 aromaticj nallein jxh-shu edmontdants claforte mirrorsysu yyqx-bit kritika1029 mtubpeng1 ishuangxin mmsx xl1990 liuguoyou ayankumarbhunia jaedukseo wuyangzz alanways ljingv xiandachen adinbied rkelln soumikbaithalu larrywonghl raja-kumar

zooming-slow-mo-cvpr-2020's Issues

about the training phase

When I was training the model, I got the following results.

20-12-14 05:05:07.043 - INFO: Model [VideoSRBaseModel] is created.
20-12-14 05:05:07.043 - INFO: Start training from epoch: 0, iter: 0
20-12-14 05:05:16.606 - INFO: Saving the final model.
20-12-14 05:05:16.793 - INFO: End of training.

It seems that the training ended without starting. Does anyone know why? Is it related to the data set? In order to test the code, I reduced the data set a lot.

and the output:
20-12-14 05:05:01.562 - INFO: Random seed: 0
20-12-14 05:05:01.570 - INFO: Temporal augmentation interval list: [1], with random reverse is True.
20-12-14 05:05:01.570 - INFO: Using cache keys: Vimeo7_train_keys.pkl
20-12-14 05:05:01.570 - INFO: Using cache keys - Vimeo7_train_keys.pkl.
20-12-14 05:05:01.571 - INFO: Dataset [Vimeo7Dataset - Vimeo7] is created.
20-12-14 05:05:01.571 - INFO: Number of train images: 3, iters: 1
20-12-14 05:05:01.572 - INFO: Total epochs needed: 600000 for iters 600,000
20-12-14 05:05:06.963 - INFO: Network G structure: DataParallel - LunaTokis, with parameters: 11,102,771

Is there anything wrong with my current output?

I would be grateful if anyone knows a solution.

video_to_zsm.py issue

Running the file using pretrain(weights) does nothing after the following output.

The required ffmpeg has been installed. In any case, I think the installation method may be wrong. What installation method did the author use? Or can you share the docker image?

Question about comparison to two-stage methods

Hi~ Thanks for your wonderful work! I have a question about comparison to two-stage methods.
In your paper, the two-stage method is to first increase the frame rate and then super-resolution. Why not SR first and then VFI? Have you compared the difference between the two process?

The lack of module

I can't find these in the .zip
from data import create_dataloader, create_dataset
from models import create_model

RuntimeError: CUDA error: CUBLAS_STATUS_ALLOC_FAILED when calling `cublasCreate(handle)`

Sorry for bothering. I have run the make.sh and it finished successfully. But when i run the test.py , something went wrong. Here is the output imformation:

Traceback (most recent call last):
File "test.py", line 255, in
example_dconv()
File "test.py", line 179, in example_dconv
error.backward()
File "/data/hzh/anaconda3/lib/python3.7/site-packages/torch/tensor.py", line 195, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph)
File "/data/hzh/anaconda3/lib/python3.7/site-packages/torch/autograd/init.py", line 99, in backward
allow_unreachable=True) # allow_unreachable flag
File "/data/hzh/anaconda3/lib/python3.7/site-packages/torch/autograd/function.py", line 77, in apply
return self._forward_cls.backward(self, *args)
File "/data/hzh/anaconda3/lib/python3.7/site-packages/torch/autograd/function.py", line 189, in wrapper
outputs = fn(ctx, args)
File "/data/hzh/ZoomingSloMo/codes/models/modules/DCNv2/dcn_v2.py", line 44, in backward
ctx.deformable_groups)
RuntimeError: CUDA error: CUBLAS_STATUS_ALLOC_FAILED when calling cublasCreate(handle) (createCublasHandle at /opt/conda/conda-bld/pytorch_1579022060824/work/aten/src/ATen/cuda/CublasHandlePool.cpp:8)
frame #0: c10::Error::Error(c10::SourceLocation, std::string const&) + 0x47 (0x7fac36e16627 in /data/hzh/anaconda3/lib/python3.7/site-packages/torch/lib/libc10.so)
frame #1: + 0x4173335 (0x7fac3ccb4335 in /data/hzh/anaconda3/lib/python3.7/site-packages/torch/lib/libtorch.so)
frame #2: at::cuda::getCurrentCUDABlasHandle() + 0x458 (0x7fac3ccb4c18 in /data/hzh/anaconda3/lib/python3.7/site-packages/torch/lib/libtorch.so)
frame #3: + 0x416b092 (0x7fac3ccac092 in /data/hzh/anaconda3/lib/python3.7/site-packages/torch/lib/libtorch.so)
frame #4: THCudaBlas_Sgemm + 0x7e (0x7fac3d0b9a3e in /data/hzh/anaconda3/lib/python3.7/site-packages/torch/lib/libtorch.so)
frame #5: dcn_v2_cuda_backward(at::Tensor const&, at::Tensor const&, at::Tensor const&, at::Tensor const&, at::Tensor const&, at::Tensor const&, int, int, int, int, int, int, int, int, int) + 0xe94 (0x7fac1b20e141 in /data/hzh/ZoomingSloMo/codes/models/modules/DCNv2/_ext.cpython-37m-x86_64-linux-gnu.so)
frame #6: dcn_v2_backward(at::Tensor const&, at::Tensor const&, at::Tensor const&, at::Tensor const&, at::Tensor const&, at::Tensor const&, int, int, int, int, int, int, int, int, int) + 0x9b (0x7fac1b1f987b in /data/hzh/ZoomingSloMo/codes/models/modules/DCNv2/_ext.cpython-37m-x86_64-linux-gnu.so)
frame #7: + 0x3f1f1 (0x7fac1b2071f1 in /data/hzh/ZoomingSloMo/codes/models/modules/DCNv2/_ext.cpython-37m-x86_64-linux-gnu.so)
frame #8: + 0x3f82e (0x7fac1b20782e in /data/hzh/ZoomingSloMo/codes/models/modules/DCNv2/_ext.cpython-37m-x86_64-linux-gnu.so)
frame #9: + 0x3af0e (0x7fac1b202f0e in /data/hzh/ZoomingSloMo/codes/models/modules/DCNv2/_ext.cpython-37m-x86_64-linux-gnu.so)

frame #22: torch::autograd::PyNode::apply(std::vector<at::Tensor, std::allocatorat::Tensor >&&) + 0x178 (0x7fac68f94468 in /data/hzh/anaconda3/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #23: + 0x3bd3fb6 (0x7fac3c714fb6 in /data/hzh/anaconda3/lib/python3.7/site-packages/torch/lib/libtorch.so)
frame #24: torch::autograd::Engine::evaluate_function(std::shared_ptrtorch::autograd::GraphTask&, torch::autograd::Node, torch::autograd::InputBuffer&) + 0x1373 (0x7fac3c711413 in /data/hzh/anaconda3/lib/python3.7/site-packages/torch/lib/libtorch.so)
frame #25: torch::autograd::Engine::thread_main(std::shared_ptrtorch::autograd::GraphTask const&, bool) + 0x4b2 (0x7fac3c712042 in /data/hzh/anaconda3/lib/python3.7/site-packages/torch/lib/libtorch.so)
frame #26: torch::autograd::Engine::thread_init(int) + 0x39 (0x7fac3c70b939 in /data/hzh/anaconda3/lib/python3.7/site-packages/torch/lib/libtorch.so)
frame #27: torch::autograd::python::PythonEngine::thread_init(int) + 0x2a (0x7fac68f8afaa in /data/hzh/anaconda3/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #28: + 0xc819d (0x7fac6887719d in /data/hzh/anaconda3/lib/python3.7/site-packages/torch/../../../libstdc++.so.6)
frame #29: + 0x76ba (0x7fac786076ba in /lib/x86_64-linux-gnu/libpthread.so.0)
frame #30: clone + 0x6d (0x7fac7833d4dd in /lib/x86_64-linux-gnu/libc.so.6)

Segmentation fault (core dumped)

I don't know how to fix it. Could you please help and give me some ideas? Thank you!

Running in Google Colab

Initially I had some trouble running this code in Google Colab, but eventually got it to work and wanted to share my notebook in case others would find this useful:

https://gist.github.com/HanClinto/49219942f76d5f20990b6d048dbacbaf

Eventually it might be nice to clean this up a bit more and put it into the repo as an included "quick start" for users.

Note that you will need to run on a Colab VM with GPU support, and it helps if you tick the "High RAM" option as well. Note that I ran out of memory when attempting to process a video of resolution 480x848, but when I switched to a video that was only 320x564 then it was able to complete after I set the command-line option of --N_out 3

Thank you so very much for this code -- it's very well done!

Testing is killed

I has used the make.sh and test the test.py. Then I download the pretrained model, and use the model to do the Test by：python video_to_zsm.py --video ../input.mp4 --model ../experiments/pretrained_models/xiang2020zooming.pth --output ../huanhuan.mp4.
But there is one issue about the kill:
ffmpeg -i ../zoom_Huanhuan-Z_1436x806.mp4 -vsync 0 .delme/%06d.png
ffmpeg version 4.0 Copyright (c) 2000-2018 the FFmpeg developers
built with gcc 7.2.0 (crosstool-NG fa8859cb)
configuration: --prefix=/root/miniconda3/envs/myconda --cc=/opt/conda/conda-bld/ffmpeg_1531088893642/_build_env/bin/x86_64-conda_cos6-linux-gnu-cc --disable-doc --enable-shared --enable-static --enable-zlib --enable-pic --enable-gpl --enable-version3 --disable-nonfree --enable-hardcoded-tables --enable-avresample --enable-libfreetype --disable-openssl --disable-gnutls --enable-libvpx --enable-pthreads --enable-libopus --enable-postproc --disable-libx264
libavutil 56. 14.100 / 56. 14.100
libavcodec 58. 18.100 / 58. 18.100
libavformat 58. 12.100 / 58. 12.100
libavdevice 58. 3.100 / 58. 3.100
libavfilter 7. 16.100 / 7. 16.100
libavresample 4. 0. 0 / 4. 0. 0
libswscale 5. 1.100 / 5. 1.100
libswresample 3. 1.100 / 3. 1.100
libpostproc 55. 1.100 / 55. 1.100
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from '../zoom_Huanhuan-Z_1436x806.mp4':
Metadata:
major_brand : mp42
minor_version : 0
compatible_brands: isommp42
creation_time : 2020-11-27T10:10:34.000000Z
Duration: 00:00:09.00, start: 0.000000, bitrate: 419 kb/s
Chapter #0:0: start 0.000000, end 9.000000
Metadata:
title : Sharing Started
Stream #0:0(und): Audio: aac (LC) (mp4a / 0x6134706D), 32000 Hz, mono, fltp, 126 kb/s (default)
Metadata:
creation_time : 2020-11-27T10:10:34.000000Z
handler_name : AAC audio
Stream #0:1(und): Video: h264 (High) (avc1 / 0x31637661), yuv420p, 1436x806, 290 kb/s, 25 fps, 25 tbr, 30k tbn, 60k tbc (default)
Metadata:
creation_time : 2020-11-27T10:10:34.000000Z
handler_name : H.264/AVC video
encoder : AVC Coding
Stream #0:2(und): Data: bin_data (text / 0x74786574), 0 kb/s
Metadata:
creation_time : 2020-11-27T10:10:34.000000Z
handler_name : Text
Stream mapping:
Stream #0:1 -> #0:0 (h264 (native) -> png (native))
Press [q] to stop, [?] for help
Output #0, image2, to '.delme/%06d.png':
Metadata:
major_brand : mp42
minor_version : 0
compatible_brands: isommp42
encoder : Lavf58.12.100
Chapter #0:0: start 0.000000, end 9.000000
Metadata:
title : Sharing Started
Stream #0:0(und): Video: png, rgb24, 1436x806, q=2-31, 200 kb/s, 25 fps, 25 tbn, 25 tbc (default)
Metadata:
creation_time : 2020-11-27T10:10:34.000000Z
handler_name : H.264/AVC video
encoder : Lavc58.18.100 png
frame= 225 fps=112 q=-0.0 Lsize=N/A time=00:00:09.00 bitrate=N/A speed=4.48x
video:64690kB audio:0kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: unknown
Killed

pip requirements missing

Hi! I was wondering if it would be possible to provide pip requirements.txt file for the anaconda environment since all of the packages need to be compatible also with the DCNv2 package. I ran into some environment problems

PSNR and SSIM during evaulation

Hi,

When running your test script on the 4 clips of Vid4 test set, I am getting an average PSNR of 24.820187 dB, which is lower than reported in your paper. However, the PSNR-Y is slightly higher than reported PSNR in table 1 at 26.352743 dB. When I used the SSIM function in the included util.py file, I got an average of SSIM: 0.776325.

I just want to confirm if my evaluation method is correct, and, also if you reported PSNR or PSNR-Y in the paper.
(data_mode = 'Vid4'),
scale = 4
N_ot = 7
flip_test = True
padding = 'replicate'
I also tried N_ot = 3 with slightly worse results. Please kindly correct me if any of the settings is wrong.

Edit: I see from previous issues that your PSNR is from Y-channel, in which case N_ot = 3 matches the result in paper

Number of deformable groups used in Sakuya_arch.py

In Sakuya_arch.py file, to execute deformable convolutions, authors are using numbers of groups as 8. So is it number of groups that we use in normal convolutions? If it is so then is there any reason to keep it as 8? Is it the same concept that was used in AlexNet paper to increase the computational efficiency or there is some other usage too?

INF loss

Hi, guys!

I tried to rerpoduce your results and seems that I'm getting gradient explosion.
After about 100-200 iterations loss gets to INF.

I used config provided by you, is it correct?
How can I fix this issue?

error,help

python test.py
报错误
import utils.util as util
ModuleNotFoundError: No module named 'utils.util'

这个utils.utils依赖什么库？
谢谢

PSNR in the paper

Are the PSNR in the paper computed in the Y channel or RGB channels? Thanks!

How much loss should one expect at 200,000 iterations?

Hi team,
I am training the model as per given instructions. At the starting point loss in the range of 10 power 5. Even after training the model for more than 200,000 iterations, the loss has decreased 10 times only (its in the range of 10 power 4). I am uploading 2 log files, kindly look at them. While training, deformable convolutions' offset seem to go above certain limit, but after some iterations it became normal (You can observe this in 2nd log at around166,000th iteration
1st.log
2nd.log

Warning：Offset mean is XXX，larger than 100

Thank you for open source your code, when train the model on Vimeo 90k，this warning occurred.
It first appeared after the first decay of the learning rate, and then the offset mean will become larger and larger. Have you ever encountered this situation? Is this normal in your training?

Importerror: modules/DCNv2/_ext.cpython-37m-x86_64-linux-gnu.so:undefined symbol_ZN3c105ErrorCLENS_14SourceLocationERKPSs

I use python3.7 , pytorch 1.0.1, cuda10.0
The make.sh can be correctly finished. Hower, when I run test.py, I meet the above error.

ValueError: Unknown CUDA arch (8.0) or GPU not supported

File "setup.py", line 70, in <module>
   cmdclass={"build_ext": torch.utils.cpp_extension.BuildExtension},
 File "/home/dgxadmin/miniconda3/envs/ZoomSloMo/lib/python3.7/site-packages/setuptools/__init__.py", line 153, in setup
   return distutils.core.setup(**attrs)
 File "/home/dgxadmin/miniconda3/envs/ZoomSloMo/lib/python3.7/distutils/core.py", line 148, in setup
   dist.run_commands()
 File "/home/dgxadmin/miniconda3/envs/ZoomSloMo/lib/python3.7/distutils/dist.py", line 966, in run_commands
   self.run_command(cmd)
 File "/home/dgxadmin/miniconda3/envs/ZoomSloMo/lib/python3.7/distutils/dist.py", line 985, in run_command
   cmd_obj.run()
 File "/home/dgxadmin/miniconda3/envs/ZoomSloMo/lib/python3.7/distutils/command/build.py", line 135, in run
   self.run_command(cmd_name)
 File "/home/dgxadmin/miniconda3/envs/ZoomSloMo/lib/python3.7/distutils/cmd.py", line 313, in run_command
   self.distribution.run_command(command)
 File "/home/dgxadmin/miniconda3/envs/ZoomSloMo/lib/python3.7/distutils/dist.py", line 985, in run_command
   cmd_obj.run()
 File "/home/dgxadmin/miniconda3/envs/ZoomSloMo/lib/python3.7/site-packages/setuptools/command/build_ext.py", line 79, in run
   _build_ext.run(self)
 File "/home/dgxadmin/miniconda3/envs/ZoomSloMo/lib/python3.7/distutils/command/build_ext.py", line 340, in run
   self.build_extensions()
 File "/home/dgxadmin/miniconda3/envs/ZoomSloMo/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 372, in build_extensions
   build_ext.build_extensions(self)
 File "/home/dgxadmin/miniconda3/envs/ZoomSloMo/lib/python3.7/distutils/command/build_ext.py", line 449, in build_extensions
   self._build_extensions_serial()
 File "/home/dgxadmin/miniconda3/envs/ZoomSloMo/lib/python3.7/distutils/command/build_ext.py", line 474, in _build_extensions_serial
   self.build_extension(ext)
 File "/home/dgxadmin/miniconda3/envs/ZoomSloMo/lib/python3.7/site-packages/setuptools/command/build_ext.py", line 196, in build_extension
   _build_ext.build_extension(self, ext)
 File "/home/dgxadmin/miniconda3/envs/ZoomSloMo/lib/python3.7/distutils/command/build_ext.py", line 534, in build_extension
   depends=ext.depends)
 File "/home/dgxadmin/miniconda3/envs/ZoomSloMo/lib/python3.7/distutils/ccompiler.py", line 574, in compile
   self._compile(obj, src, ext, cc_args, extra_postargs, pp_opts)
 File "/home/dgxadmin/miniconda3/envs/ZoomSloMo/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 288, in unix_wrap_compile
   "'-fPIC'"] + cflags + _get_cuda_arch_flags(cflags)
 File "/home/dgxadmin/miniconda3/envs/ZoomSloMo/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 1027, in _get_cuda_arch_flags
   raise ValueError("Unknown CUDA arch ({}) or GPU not supported".format(arch))
ValueError: Unknown CUDA arch (8.0) or GPU not supported

Error comes from missing arch for compute=80 in pytorch=1.4.0 version (got it from HighResSlowMo.ipynb )
see
/lib/python3.7/site-packages/torch/utils/cpp_extension.py -> def _get_cuda_arch_flags(cflags=None):

--> just give -gencode=arch=compute_XX,code=sm_XX with compile args in setup.py
--> in my case: "-gencode=arch=compute_75,code=sm_75",
-->in /codes/models/modules/DCNv2/setup.py

def get_extensions():
    this_dir = os.path.dirname(os.path.abspath(__file__))
    extensions_dir = os.path.join(this_dir, "src")

    main_file = glob.glob(os.path.join(extensions_dir, "*.cpp"))
    source_cpu = glob.glob(os.path.join(extensions_dir, "cpu", "*.cpp"))
    source_cuda = glob.glob(os.path.join(extensions_dir, "cuda", "*.cu"))

    sources = main_file + source_cpu
    extension = CppExtension
    extra_compile_args = {"cxx": []}
    define_macros = []

    if torch.cuda.is_available() and CUDA_HOME is not None:
        extension = CUDAExtension
        sources += source_cuda
        define_macros += [("WITH_CUDA", None)]
        extra_compile_args["nvcc"] = [
            "-DCUDA_HAS_FP16=1",
            "-gencode=arch=compute_75,code=sm_75",
            "-D__CUDA_NO_HALF_OPERATORS__",
            "-D__CUDA_NO_HALF_CONVERSIONS__",
            "-D__CUDA_NO_HALF2_OPERATORS__",
        ]
    else:
        raise NotImplementedError('Cuda is not availabel')

Its an torch-version-issue

Windows support

Hi. Do you can exe build for Windows OS?

RuntimeError: cuDNN error: CUDNN_STATUS_NOT_SUPPORTED. This error may appear if you passed in a non-contiguous input.

test using zsm_my_video.sh get error ！

ImportError: cannot import name 'DCN_sep'

Hello,

Thanks very much for sharing the code and your wonderful job.
When run python test.py under Zooming-Slow-Mo-CVPR-2020/codes, got the following error. And I checked the the code under DCNv2 module, there is no DCN_sep class or function. Did you forget to check in some code?

Traceback (most recent call last):
File "/root/Zooming-Slow-Mo-CVPR-2020/codes/models/modules/Sakuya_arch.py", line 9, in
from models.modules.DCNv2.dcn_v2 import DCN_sep
ImportError: cannot import name 'DCN_sep'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "test.py", line 16, in
import models.modules.Sakuya_arch as Sakuya_arch
File "/root/Zooming-Slow-Mo-CVPR-2020/codes/models/modules/Sakuya_arch.py", line 11, in
raise ImportError('Failed to import DCNv2 module.')
ImportError: Failed to import DCNv2 module.

Questions about training details

Pretty impressed by your work. I have some questions about the training phase.
When I trained the model on 2 Nvidia 1080Ti GPUs with batch size 8 (followed by training details in the paper), it seems that it requires more time to achieve your results. I am wondering which way (dataparallel or torch.distributed.launch) do you recommend to use in the experiment?

when I run bash make.sh, here is a question

/home/fxy/.conda/envs/torchPy/lib/python3.6/site-packages/torch/include/ATen/cuda/NumericLimits.cuh(86): warning: calling a constexpr host function("from_bits") from a host device function("upper_bound") is not allowed. The experimental flag '--expt-relaxed-constexpr' can be used to allow this.

/home/fxy/.conda/envs/torchPy/lib/python3.6/site-packages/torch/include/ATen/cuda/NumericLimits.cuh(84): warning: calling a constexpr host function("from_bits") from a host device function("max") is not allowed. The experimental flag '--expt-relaxed-constexpr' can be used to allow this.

/home/fxy/.conda/envs/torchPy/lib/python3.6/site-packages/torch/include/ATen/cuda/NumericLimits.cuh(85): warning: calling a constexpr host function("from_bits") from a host device function("lower_bound") is not allowed. The experimental flag '--expt-relaxed-constexpr' can be used to allow this.

/home/fxy/.conda/envs/torchPy/lib/python3.6/site-packages/torch/include/c10/util/ArrayRef.h:277:55: warning: ‘deprecated’ attribute directive ignored [-Wattributes]
using IntList C10_DEPRECATED_USING = ArrayRef<int64_t>;
^
/home/fxy/Desktop/mountTry/fxy/Zooming-Slow-Mo-CVPR-2020-master/codes/models/modules/DCNv2/src/cuda/dcn_v2_psroi_pooling_cuda.cu: In lambda function:
/home/fxy/Desktop/mountTry/fxy/Zooming-Slow-Mo-CVPR-2020-master/codes/models/modules/DCNv2/src/cuda/dcn_v2_psroi_pooling_cuda.cu:317:120: warning: ‘c10::ScalarType detail::scalar_type(const at::DeprecatedTypeProperties&)’ is deprecated (declared at /home/fxy/.conda/envs/torchPy/lib/python3.6/site-packages/torch/include/ATen/Dispatch.h:47) [-Wdeprecated-declarations]
AT_DISPATCH_FLOATING_TYPES(input.type(), "dcn_v2_psroi_pooling_cuda_forward", [&] {
^
/home/fxy/Desktop/mountTry/fxy/Zooming-Slow-Mo-CVPR-2020-master/codes/models/modules/DCNv2/src/cuda/dcn_v2_psroi_pooling_cuda.cu: In lambda function:
/home/fxy/Desktop/mountTry/fxy/Zooming-Slow-Mo-CVPR-2020-master/codes/models/modules/DCNv2/src/cuda/dcn_v2_psroi_pooling_cuda.cu:391:126: warning: ‘c10::ScalarType detail::scalar_type(const at::DeprecatedTypeProperties&)’ is deprecated (declared at /home/fxy/.conda/envs/torchPy/lib/python3.6/site-packages/torch/include/ATen/Dispatch.h:47) [-Wdeprecated-declarations]
AT_DISPATCH_FLOATING_TYPES(out_grad.type(), "dcn_v2_psroi_pooling_cuda_backward", [&] {
^
creating build/lib.linux-x86_64-3.6
g++ -pthread -shared -B /home/fxy/.conda/envs/torchPy/compiler_compat -L/home/fxy/.conda/envs/torchPy/lib -Wl,-rpath=/home/fxy/.conda/envs/torchPy/lib -Wl,--no-as-needed -Wl,--sysroot=/ build/temp.linux-x86_64-3.6/home/fxy/Desktop/mountTry/fxy/Zooming-Slow-Mo-CVPR-2020-master/codes/models/modules/DCNv2/src/vision.o build/temp.linux-x86_64-3.6/home/fxy/Desktop/mountTry/fxy/Zooming-Slow-Mo-CVPR-2020-master/codes/models/modules/DCNv2/src/cpu/dcn_v2_cpu.o build/temp.linux-x86_64-3.6/home/fxy/Desktop/mountTry/fxy/Zooming-Slow-Mo-CVPR-2020-master/codes/models/modules/DCNv2/src/cuda/dcn_v2_cuda.o build/temp.linux-x86_64-3.6/home/fxy/Desktop/mountTry/fxy/Zooming-Slow-Mo-CVPR-2020-master/codes/models/modules/DCNv2/src/cuda/dcn_v2_im2col_cuda.o build/temp.linux-x86_64-3.6/home/fxy/Desktop/mountTry/fxy/Zooming-Slow-Mo-CVPR-2020-master/codes/models/modules/DCNv2/src/cuda/dcn_v2_psroi_pooling_cuda.o -L/usr/local/cuda/lib64 -lcudart -o build/lib.linux-x86_64-3.6/_ext.cpython-36m-x86_64-linux-gnu.so
running develop
running egg_info
creating DCNv2.egg-info
writing DCNv2.egg-info/PKG-INFO
writing dependency_links to DCNv2.egg-info/dependency_links.txt
writing top-level names to DCNv2.egg-info/top_level.txt
writing manifest file 'DCNv2.egg-info/SOURCES.txt'
reading manifest file 'DCNv2.egg-info/SOURCES.txt'
writing manifest file 'DCNv2.egg-info/SOURCES.txt'
running build_ext
copying build/lib.linux-x86_64-3.6/_ext.cpython-36m-x86_64-linux-gnu.so ->
error: [Errno 1] Operation not permitted

I don't know what's the problem and how to fix it??
my environment is cuda9.0, gcc4.9.2, torch1.1
thanks a lot !

Out Of Memory

Running zsm_my_video.sh. No matter what I do, I keep getting out of memory errors. I'm using an RTX 2060 with 16GB of system RAM and 6GB of dedicated GPU RAM. I'm running against a 1 second clip from a 480p video.

RuntimeError: CUDA out of memory. Tried to allocate 676.00 MiB (GPU 0; 6.00 GiB total capacity; 3.49 GiB already allocated; 662.13 MiB free; 3.66 GiB reserved in total by PyTorch) (malloc at ..\c10\cuda\CUDACachingAllocator.cpp:289)

No CUDA runtime is found, using CUDA_HOME='/usr/local/cuda'

I'm trying to install, but receive the following error:

# bash make.sh
No CUDA runtime is found, using CUDA_HOME='/usr/local/cuda'
Traceback (most recent call last):
  File "setup.py", line 69, in <module>
    ext_modules=get_extensions(),
  File "setup.py", line 42, in get_extensions
    raise NotImplementedError('Cuda is not availabel')
NotImplementedError: Cuda is not availabel

I added that in make.sh:

export CUDA_HOME=/usr/local/cuda-11.0
export CUDNN_INCLUDE_DIR=/usr/local/cuda-11.0/include
export CUDNN_LIB_DIR=/usr/local/cuda-11.0/lib64

But that just changed the error to:
No CUDA runtime is found, using CUDA_HOME='/usr/local/cuda-11.0'

This is torch.cuda.is_available() that returns false.

Where can I download this paper？

Thanks ~~~~

why DAIN before EDVR

Hi Musosame, congrats on your achivement on this new task!

I have a question with regards to the order of the VFI and VSR method.
In your paper, DAIN with EDVR make a strong competitor for Zooming SlowMo. But why video frame interpolation before video super resolution?
Intuitively, enhanced spatial resoltion will improve optical flow estimation, and then frame interpolation too. But not opposite. have you conduct experiment considering this factor? the order of VFI and VSR?

Generating LR images

To generate LR images i need to run python $ZOOMING_ROOT/codes/data_scripts/generate_mod_LR_bic.py . Do I need to run this for every folder as it is looking for png files?

Generated video is two times longer

Is this because it inserted extra frames? Is there parameter to keep the length the same?

URL in paper is written in correct

The URL for this repo is written incorrectly in the paper
https://github.com/Mukosame/Zooming-SlowMo-CVPR-2020 ,
The correct one:
https://github.com/Mukosame/Zooming-Slow-Mo-CVPR-2020 it is missing a hyphen.

Paper taken from CVPR website.
https://openaccess.thecvf.com/content_CVPR_2020/papers/Xiang_Zooming_Slow-Mo_Fast_and_Accurate_One-Stage_Space-Time_Video_Super-Resolution_CVPR_2020_paper.pdf

Training stuck at epoch 0,iter 0

Hi,

Problem:

I follow the training instructions,but the training process stuck at epoch 0, iter 0. It can not continue to train.

Specific implementations details:

1、compile the DCNv2.
It raises a problem of RuntimeError: Backward is not reentrant. According to CharlesShang 's GitHub, it may not be a serious problem and no one has give a solution. So, i do not fix this problem.

2、create_lmdb of train_GT and train_LR7

Split vimeo_setuplet into train and test set
Downsample: train set were downsamled to get train_LRx4 dataset.
create train_GT and train_LR7 lmdb files:

When creating lmdb files , it raise a issue of "FileNotFoundError: [Errno 2] No such file or directory: '/home/dataset/vimeo7_train_GT.lmdb/meta_info.pkl' "

So , I change the line 120 "Vimeo7_train_keys.pkl " with " meta_info.pkl".

Generated train_GT and train_LR7 files are as follows:

/home/dataset/vimeo7_train_GT.lmdb/
data.mdb 153790088 kb
lock.mdb 8kb
meta_info.pkl 1261kb

/home/dataset/vimeo7_train_LR7.lmdb/
data.mdb 10868344 kb
lock.mdb 8kb
meta_info.pkl 1261kb

Does anyone have ideas ?

Difference in the PSNR values from pre-trained model and paper on Vid4 data

Thanks @Mukosame for wonderful work.

I tried running Zoom-Slo-Mo on Vid4 data.
And bellow are the PSNR results obtained using pre-trained model given in the repo:

20-07-29 12:24:23.998 - INFO: ################ Tidy Outputs ################
20-07-29 12:24:23.998 - INFO: Folder calendar - Average PSNR: 15.863634 dB PSNR-Y: 17.315718 dB.
20-07-29 12:24:23.998 - INFO: Folder city - Average PSNR: 20.775272 dB PSNR-Y: 22.207506 dB.
20-07-29 12:24:23.998 - INFO: Folder foliage - Average PSNR: 19.139918 dB PSNR-Y: 20.540553 dB.
20-07-29 12:24:23.998 - INFO: Folder walk - Average PSNR: 20.339251 dB PSNR-Y: 21.705512 dB.
20-07-29 12:24:23.998 - INFO: ################ Final Results ################
20-07-29 12:24:23.998 - INFO: Data: Vid4 - /home/ubuntu/Basavaraj/Zooming-Slow-Mo-CVPR-2020/test_example/Vid4/LR/*
20-07-29 12:24:23.999 - INFO: Padding mode: replicate
20-07-29 12:24:23.999 - INFO: Model path: ../experiments/pretrained_models/xiang2020zooming.pth
20-07-29 12:24:23.999 - INFO: Save images: False
20-07-29 12:24:23.999 - INFO: Flip Test: True
20-07-29 12:24:23.999 - INFO: Total Average PSNR: 19.029519 dB PSNR-Y: 20.442322 dB for 4 clips.

Is it possible to train a 2x model?

Hi @Mukosame

Thank you for the great code! I wonder whether it is possible to train a 2x model. If it is, how to revise the model file? Thank you!

Best,
Yongcheng

Maintaining FPS while using Super Resolution

Thank you so much for the amazing work and open sourcing it, and actively sorting out queries for the community.
There are two queries,

I believe in order to run the model efficiently, rather than loading the images all at once into memory -- one could batch the paths to the images and read them on the fly -- I'll attempt to work on this, might be useful in case of upscaling higher resolution videos.
This might be a stupid question, but I already have a 50 FPS video, and I would like to upscale it using Zooming Slow Mo but maintain the number of frames and not double it, how do I skip the interpolation aspect of the algorithm?

Would appreciate any insights.
Thank you,
Sree Harsha

Questions about training phase

Thanks for your excellent work~, and I have some questions about the training phase, hoping for your response.

It seems that there is something wrong with the file Vimeo7_dataset.py, the data structure of self.paths_GT is dict, so I think you need to add one line to make it works.
I am very curious about your runtime during training phase, when I trained the model from the stratch on 4 Nvidia 2080TI GPUs, it spent nearly 2 minutes for every 100 iterations, which is longer than EDVR.

where is the place in source to change the output frame numbers ?

Hi;

I am wondering whether there is any output frame number control method.

Currently, it takes the output number of frames and divide it to define the input frames as well.

So, it works with the number of frames and interleave those to generate the output frames, but it generate only one in the middle of input frames.

Is their any way to generate more frames in the middle of input frames ?

For example, if the input frames is "3", the output frames is now "5" because it generates extra frames in the middle, then

What I want to is a control parameter to generate "5","7","9","11" or "13" with the input "3".

Then, it can actually generate 48Hz,72Hz, 96Hz, 120Hz, 144Hz from 24Hz instead of just doubling.

run test error

Traceback (most recent call last):
File "test.py", line 265, in
check_gradient_dconv()
File "test.py", line 97, in check_gradient_dconv
eps=1e-3, atol=1e-4, rtol=1e-2))
File "/home/yangjeff/anaconda3/envs/py37/lib/python3.7/site-packages/torch/autograd/gradcheck.py", line 289, in gradcheck
return fail_test('Backward is not reentrant, i.e., running backward with same '
File "/home/yangjeff/anaconda3/envs/py37/lib/python3.7/site-packages/torch/autograd/gradcheck.py", line 224, in fail_test
raise RuntimeError(msg)
RuntimeError: Backward is not reentrant, i.e., running backward with same input and grad_output multiple times gives different values, although analytical gradient matches numerical gradient

Is this issue on Deformable Convolution v2, how could I implement such function.

error: identifier "THCState_getCurrentStream" is undefined

Hi!
First of all thank you for sharing the code and you have done amazing work! I am following the instructions given by you. I am pasting the snippet of the error which is related to dcn_v2_cuda.cu file. I am trying to run it on my personal machine which does not have GPU. On my machine I can run make.sh from your repo and I can run it from CharlShangs repo too. Adding to that I can run its testcpu.py file on my personal machine. But as I will need GPU for further process so I am using my lab's GPU through vpn and ssh(due to current situation I am not in my campus). In that machine I am not able to run bash script. Following is the snippet of the error. Just for checking purpose I tried to run both your as well as CharlShangs bash script and error was the same. So I suppose there must be some issue with the remote server. Kindly let me know if you know about this issue.

...DCNv2/src/cuda/dcn_v2_cuda.cu(107): error: identifier "THCState_getCurrentStream" is undefined

...DCNv2/src/cuda/dcn_v2_cuda.cu(279): error: identifier "THCState_getCurrentStream" is undefined

My personal system and remote server has torch 1.4.0 and torchvision 0.5.0. Remote server has same versions of torch and torchvision and cuda 10.2.

Is there any way to solve cuda out of memory problem when input image is large?

Is there any way to solve cuda out of memory problem when input image is large?
My input is about 1240x650 large. How to get around with this problem?

Change number of interpolations

Is it possible to change the number of interpolated frames between the original frames?

Amazing work, btw. :)

Questions about training details

Thanks for your excellent work. I have some questions about the training phase.

When I trained the model on 2 Nvidia P100 GPUs with batch size 32, it spent nearly 6.5 minutes for every 100 iterations. It cost nearly the same time when I trained on either .png or lmdb format. Could you give any advices for accelerating the training speed?
Did you train the model with batch size 16 for 600000 iters as the .yml file to achieve the provided pretrained model, or with batch size 24 and fewer iters? When the batch size changed, did you modify the initial learning rate or other related setting?
Segmentation fault occured when I ran test.py in codes/models/modules/DCNv2, but I ignored the error and succeeded in running the training and testing phase with a close PSNR result. Would this be the reason for the slow training speed or lead to some errors?
Looking forward to your reply. Thank you.

Vimeo-septuplet directory structure

Hello, thank you for your wonderful work. However, as a newbie, I am very confused about how to generate LR images and lmdb files. Can you share your Vimeo-septuplet directory structure? This will help better understand the data processing flow.

error compiling DCN

Zooming-Slow-Mo-CVPR-2020/codes/models/modules/DCNv2/src/cuda/dcn_v2_cuda.cu(279): error: identifier "THCState_getCurrentStream" is undefined

Zooming-Slow-Mo-CVPR-2020/codes/models/modules/DCNv2/src/cuda/dcn_v2_cuda.cu(279): error: identifier "THCState_getCurrentStream" is undefined

2 errors detected in the compilation of "/tmp/tmpxft_00000988_00000000-6_dcn_v2_cuda.cpp1.ii".
error: command '/usr/bin/nvcc' failed with exit status 1

ubuntu 20.04, cuda 10.1, python 3.8

Questions about generating LR images

thanks for your excellent work. I met some problems in the stage of generating LR images.

I want to generate LR images via Python, so I run generate_mod_LR_bic.py, but I am told ''name 'imresize_np' is not defined''. I try to solve it, but don't find 'imresize_np' in util.py of 'data' folder.
Are the input and output path in generate_mod_LR_bic.py identical to these of sep_vimeo_list.py? I set these path in the two python files same, the HR, LR and Bic folder can be created, but no images are produced in these folder.

Could you help me to solve above issues?

Question about processing a whole video

Hi,

I have a question about what should you do with a whole video. Let's say we have a whole video, (L1, L2, ...L14) and your processing window is 7. What is your window stride now? What should you do now? Do you have overlapped problem? If so, how do you deal with it?

Many thanks

segmentation fault

hello, when i am running test.py ,i got an error: segmentation fault, could you help me? thank you!

When the scene changes suddenly, the effect of inserting frames will not work

RuntimeError: Jacobian mismatch for output 0 with respect to input 1,

When I run python test.py

torch.Size([2, 64, 128, 128])
torch.Size([20, 32, 7, 7])
torch.Size([20, 32, 7, 7])
torch.Size([20, 32, 7, 7])
0.971507, 1.943014
0.971507, 1.943014
Zero offset passed
/home/dkliang/miniconda3/envs/pytorch1.2/lib/python3.6/site-packages/torch/autograd/gradcheck.py:242: UserWarning: At least one of the inputs that requires gradient is not of double precision floating point. This check will likely fail if all the inputs are not of double precision floating point.
'At least one of the inputs that requires gradient '
check_gradient_dpooling: True
Traceback (most recent call last):
File "test.py", line 265, in
check_gradient_dconv()
File "test.py", line 97, in check_gradient_dconv
eps=1e-3, atol=1e-4, rtol=1e-2))
File "/home/dkliang/miniconda3/envs/pytorch1.2/lib/python3.6/site-packages/torch/autograd/gradcheck.py", line 289, in gradcheck
'numerical:%s\nanalytical:%s\n' % (i, j, n, a))
File "/home/dkliang/miniconda3/envs/pytorch1.2/lib/python3.6/site-packages/torch/autograd/gradcheck.py", line 227, in fail_test
raise RuntimeError(msg)
RuntimeError: Jacobian mismatch for output 0 with respect to input 1,
numerical:tensor([[-0.0003, 0.0000, 0.0000, ..., 0.0000, 0.0000, 0.0000],
[ 0.0000, 0.0013, 0.0000, ..., 0.0000, 0.0000, 0.0000],
[ 0.0000, 0.0000, 0.0000, ..., 0.0000, 0.0000, 0.0000],
...,
[ 0.0000, 0.0000, 0.0000, ..., 0.0000, 0.0000, 0.0000],
[ 0.0000, 0.0000, 0.0000, ..., 0.0000, 0.0000, 0.0000],
[ 0.0000, 0.0000, 0.0000, ..., 0.0000, 0.0000, 0.0000]])
analytical:tensor([[-0.0003, 0.0000, 0.0000, ..., 0.0000, 0.0000, 0.0000],
[ 0.0000, 0.0013, 0.0000, ..., 0.0000, 0.0000, 0.0000],
[ 0.0000, 0.0000, 0.0000, ..., 0.0000, 0.0000, 0.0000],
...,
[ 0.0000, 0.0000, 0.0000, ..., 0.0000, 0.0000, 0.0000],
[ 0.0000, 0.0000, 0.0000, ..., 0.0000, 0.0000, 0.0000],
[ 0.0000, 0.0000, 0.0000, ..., 0.0000, 0.0000, 0.0000]])

Continue training of existing model

Hello Mukosame,

Congratulations to this amazing paper. I just tried it and it is really good!
Is there a way where I can continue training on an existing pretrained model? I couldn't find this functionality in your code (but maybe I missed it).

Otherwise, is there a way to easily adapt the code that this is also possible?

Possible to run on Windows?

Running bash make.sh in DCNv2 under Windows fails because WSL can't access CUDA.

Is it possible to use a precompiled version under Windows, or is there another approach?

About the warning information of the DCNv2 module during training.

Hi, thank you for creating such an innovative and wonderful space-time video super-resolution method.

When I follow the training settings in train_zsm.yml, the terminal sometimes shows the information: "WARNING: Offset mean is XXX, larger than 100". Then I looked for a solution and found that resuming from the nearest checkpoint might help. However, when I resumed from the nearest checkpoint, the warning information still happens again.

At the same time, I found that although the warning message continued to be displayed, the performance(PSNR) continued to increase.

So I would like to know whether this happened during your training. If it happened, how did you deal with it to achieve the performance in the paper?

Here is my training config file:

name: zsm_official
use_tb_logger: false #true
model: VideoSR_base
distortion: sr
scale: 4
gpu_ids: [2, 3]

datasets:
  train:
    name: Vimeo7
    mode: Vimeo7
    interval_list: [1]
    random_reverse: true #false
    border_mode: false
    dataroot_GT: /home/lz/xg/vimeo7_train_GT.lmdb
    dataroot_LQ: /home/lz/xg/vimeo7_train_LR7.lmdb
    cache_keys: Vimeo7_train_keys.pkl 

    N_frames: 7
    use_shuffle: true
    n_workers: 12 # per GPU
    batch_size: 24
    GT_size: 128 
    LQ_size: 32
    use_flip: true
    use_rot: true
    color: RGB

network_G:
  which_model_G: LunaTokis
  nf: 64
  nframes: 7
  groups: 8
  front_RBs: 5
  mid_RBs: 0
  back_RBs: 40
  HR_in: false

path:
  pretrain_model_G: ~
  strict_load: true #true #
  resume_state:  ~

train:
  lr_G: !!float 4e-4
  lr_scheme: CosineAnnealingLR_Restart
  beta1: 0.9
  beta2: 0.99
  niter: 600000
  warmup_iter: -1 #4000  # -1: no warm up
  T_period: [150000, 150000, 150000, 150000]
  restarts: [150000, 300000, 450000]
  restart_weights: [1, 1, 1]
  eta_min: !!float 1e-7

  pixel_criterion: cb
  pixel_weight: 1.0
  val_freq: !!float 5e3

  manual_seed: 0

logger:
  print_freq: 1
  save_checkpoint_freq: !!float 5e3