a4bio / simvp Goto Github PK

The official implementation of the CVPR'22 paper SimVP: Simpler Yet Better Video Prediction.

Python 99.43% Shell 0.57%

simvp's Introduction

SimVP: Simpler yet Better Video Prediction

In the example, the default epoch is 50. Please read our paper, and train 1000~2000 epochs for repruducing this work! I will not respond to such a lowly mistake.

The pre-trained models and benchmarks will be available in SimVPv2.

SimVPv2 is available on https://github.com/chengtan9907/SimVPv2, which performs better than SimVP (15.05 MSE on Moving MNIST) and is in the review process. If our work is helpful for your research, we would hope you give us a star and citation. Thanks!

This repository contains the implementation code for paper:

SimVP: Simpler yet Better Video Prediction
Zhangyang Gao, Cheng Tan, Lirong Wu, Stan Z. Li. In CVPR, 2022.

Introduction

From CNN, RNN, to ViT, we have witnessed remarkable advancements in video prediction, incorporating auxiliary inputs, elaborate neural architectures, and sophisticated training strategies. We admire these progresses but are confused about the necessity: is there a simple method that can perform comparably well? This paper proposes SimVP, a simple video prediction model that is completely built upon CNN and trained by MSE loss in an end-to-end fashion. Without introducing any additional tricks and complicated strategies, we can achieve state-of-the-art performance on five benchmark datasets. Through extended experiments, we demonstrate that SimVP has strong generalization and extensibility on real-world datasets. The significant reduction of training cost makes it easier to scale to complex scenarios. We believe SimVP can serve as a solid baseline to stimulate the further development of video prediction.

Dependencies

torch
scikit-image=0.16.2
numpy
argparse
tqdm

Overview

API/ contains dataloaders and metrics.
main.py is the executable python file with possible arguments.
model.py contains the SimVP model.
exp.py is the core file for training, validating, and testing pipelines.

Install

This project has provided an environment setting file of conda, users can easily reproduce the environment by the following commands:

  conda env create -f environment.yml
  conda activate SimVP

Moving MNIST dataset

  cd ./data/moving_mnist
  bash download_mmnist.sh

TaxiBJ dataset

We provide a Dropbox to download TaxiBJ dataset. Users can download this dataset and put it into ./data/taxibj.

KTH dataset

We provide a Dropbox to download the KTH dataset.

Citation

If you are interested in our repository and our paper, please cite the following paper:

@InProceedings{Gao_2022_CVPR,
    author    = {Gao, Zhangyang and Tan, Cheng and Wu, Lirong and Li, Stan Z.},
    title     = {SimVP: Simpler Yet Better Video Prediction},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    month     = {June},
    year      = {2022},
    pages     = {3170-3180}
}

Contact

If you have any questions, feel free to contact us through email ([email protected], [email protected]). Enjoy!

simvp's People

Contributors

Stargazers

Watchers

simvp's Issues

Predictive image blurring

I trained using the Kitti dataset, and the input image size is 128 * 416, how can I solve the problem of blurring the generated image？

How to use n time steps to predict n' steps

Gao, hi, your model is great!
I am on the way to use, encountered a problem, you given the source code seems to input how many time steps, how many time output, I would like to ask, how to use n time steps to predict n' time steps? I'd like some advice.

flexible length prediction required

The SimVP is set to produce a fixed-length sequence prediction whose length is the same as the input sequence's.

Experimental setup

Hi author, I'd like to ask about specific experimental setup parameters such as batchsize, lr, NS, NT, Hs, Ht, etc., as I can't seem to fully reproduce the effects in the paper，I hope you can be patient with them，I'd appreciate it.

KTH datasets problem

when I run dataloader_kth.py, something is wrong :

`Cannot open file. This file was likely created with Python 2 and an old hickle version.’

can you share the new hickle file (from Python 3)？
Thanks for your interesting work

How to set the forecasting sequence length T ′ in your code

In your master code, the forecasting sequence length must be same as input sequence length T. but in your article, those two T could be different. Could you please implement the function setting dynamic forecasting sequence length?

Parallelization

Hello,

Is there any way to parallelize this code across multiple GPUs??

Thanks,
Sam

KITTI-Caltech preprocessed data & training details

Thank you for sharing the code of this interesting paper. I notice you have shown the evaluation on the Caltech dataset, I am wondering if it is possible to also share your preprocessed KITTI-Caltech dataset? I assume you use the KITTI training set from directly from PredNet, whose resolution is 128 X 160, if that is the case, then only the Caltech test set will be enough.

Since I am trying to replicate the experiment for SimVP on the KITTI-Caltech setting, some training details of your experiment would also be very helpful, e.g. learning rate, learning scheduler, and batch size.

Thank you so much for your time!

Human3.6 dataset

Hi, thank you for your works.
Is it possible to give out Human3.6 dataset or code that you tested on your experiments? Thanks.

pretrained model

I want to know if you have any plans to release the pretraining model for each datasets

Custom dataset

Can you give me instructions on how to train the network on a custom dataset? It contains 512x512 RGB images with labels in the form of timestamps of when they were taken, in differences of minutes. I'm trying to predict what the next images in the sequence will look like in the next timestamps.

Effect of the input size on the performance of SimVP

Hi, Gao
I find the input size is almost smaller than 128 in the paper. So, I take a training on my own dataset which are same expect for different input size (128, and 512), leading to different performance:

for size 128, the performance has improved signigicantly;
but for size 512, the performance is always oscillatory during training.

What causes this situation? Can you help me with the questions or give me some suggestions?

Thanks

Question regarding Training Signal Sources

Hi, thank you very much for releasing code for this inspiring work. Regarding the prediction length, say we input 10 previous frames and output 10 future frames, if our final goal is to predict 1 future frame at evaluation time, do we also calculate and backpropagate the loss on the 2nd to the 9th future frames or just on the 1st future frame during training?

Thank you very much for your time and help!

如何配置数据集

作者你好，想请问一下是怎么把视频帧转成数据集中的npz格式的呢。因为github上的数据集格式都是npy的，数据集的结构看着不是很直观。我这里有一段连续的视频帧，想放进去train一下，但是都是jpg格式的。想问一下这个视频帧是按照什么格式准备成npy格式的，您那里有准备数据集的train和test部分相关的代码吗，能否共享一下呀。我的邮箱是[email protected]

Custom dataset generation

TaxiBJ results

Hi,
May I know whether you can share how you did the pre-processing and the post-processing of the TaxiBJ frames? It's stated int he paper " Following [69], we transform the data into [0, 1] via max-min normalization. Since the origi- nal data is between -1 and 1, the reported MSE and MAE are 1/4 and 1/2 of the original ones, consistent with previous lit- erature ". It would be helpful if you can guide me.

Thank you!