hzwer / eccv2022-rife Goto Github PK

View Code? Open in Web Editor NEW

4.1K 76.0 422.0 10.72 MB

ECCV2022 - Real-Time Intermediate Flow Estimation for Video Frame Interpolation

License: MIT License

Python 96.64% Jupyter Notebook 2.85% Dockerfile 0.44% Shell 0.08%

video-interpolation computer-vision slomo-filter deep-learning aigc

eccv2022-rife's Introduction

Hi there 👋

I used to be an algorithm contest player NOI🥈, ICPC-regional🏅️.
I worked at MEGVII Research From 2017 to 2023. Currently I work at StepFun. I received my B.S. degree from Peking Univerisity in 2020.

Main Projects:

Cooperation Projects:

Google Scholar, 知乎, 算法博客, Email, CV

Service: CVPR22-24/ECCV22-24/ICCV23/AAAI23/NeurIPS23/ICLR24/ICML24/WACV24/TIP/TPAMI/TOMM

eccv2022-rife's People

Contributors

Stargazers

Watchers

Forkers

evie404 styler00dollar yukisakuma stonecypher scys davidalphafox carlosfranco9 louislwang ideaplexus ywu40 wikiworker gavinatthu michaelwhite34 icer1 bradsegal dbonattoj rbozydar macca69 aihill kakalovedata hhy5277 hadryan a1600012888 satrusskumar arseniysky movane xrosliang yxyume nasa03 xidiancpy awesome-archive chonspqx zctt00 cv-ip goswamig shaunstanislauslau oysteinkrog mar2ck mkemka alexsevas jpotier zotikus1001 mlengse janfschr smeshing zeta1999 iwillcodeu cwb96 happyxuwork codehxj arryboom bensonlp saoruy tonywork juanchowang dynmi seranus assassindesign legendarydaim anxiaoci bravew erickang08 zhigaloff helloworldcn worthless443 melih-durmaz leo-ryu nikhil0003 ajichand2009 chappjo riku-42 x3nosiz tubbz-alt sloganking liuguoyou rj-0605 delldu zouppa aertist yrcrcy miragine natsuo117 projektosmium talosh abhishekvermasg ak9250 n00mkrad dut3062796s danxcorp 009twb 03050903 bencoster abhishekanimatron seominlee niceban ko1n stakhiev-alexander lucawiouh battyone 2362524514

eccv2022-rife's Issues

Add argument to keep sound ?

Sorry if I am overstepping the bounds of the project, but I have found that the interpolated output has no sounds. It would be awesome if you could add an argument (like --sounds) to keep the sound on the generated output -- or at least when output is the same length as input.

EDIT: Temporary solution:
ffmpeg -i "$video_name" audio.mp3 -y
ffmpeg -i "video_4x.mp4" -i "audio.mp3" -map 0:0 -map 1:0 -c:v copy -c:a copy "video_4x_audio.mp4" -y

Google Colab

Willing to set up a Google Colab notebook or a Docker setup?
interested in seeing results for 3DCG interpolation

CUDA out of memory though there supposed to be enough VRAM

n00mkrad/flowframes#2 (comment)

I got below message when running RIFE integrated in Flowframes.

CUDA out of memory. Tried to allocate 60.00 MiB (GPU 0; 6.00 GiB total capacity; 348.98 MiB already allocated; 8.99 MiB free; 444.00 MiB reserved in total by PyTorch)

Full stack is as below. No other VRAM-sensitive apps were running at the meanwhile.

12-2-2020 16:29:30: [E] Traceback (most recent call last):
12-2-2020 16:29:30: [E]   File "interp-parallel.py", line 138, in <module>
12-2-2020 16:29:30: [E]     inferences = make_inference(model, I0, I1, exp=args.times)
12-2-2020 16:29:30: [E]   File "interp-parallel.py", line 110, in make_inference
12-2-2020 16:29:30: [E]     middle = model.inference(I0, I1)
12-2-2020 16:29:30: [E]   File "D:\Apps\Flowframes\FlowframesData\pkgs\rife-cuda\model\RIFE.py", line 207, in inference
12-2-2020 16:29:30: [E]     return self.predict(imgs, flow, training=False).detach()
12-2-2020 16:29:30: [E]   File "D:\Apps\Flowframes\FlowframesData\pkgs\rife-cuda\model\RIFE.py", line 191, in predict
12-2-2020 16:29:30: [E]     refine_output, warped_img0, warped_img1, warped_img0_gt, warped_img1_gt = self.fusionnet(
12-2-2020 16:29:30: [E]   File "D:\Apps\Flowframes\FlowframesData\pkgs\py\Lib\site-packages\torch\nn\modules\module.py", line 722, in _call_impl
12-2-2020 16:29:30: [E]     result = self.forward(*input, **kwargs)
12-2-2020 16:29:30: [E]   File "D:\Apps\Flowframes\FlowframesData\pkgs\rife-cuda\model\RIFE.py", line 118, in forward
12-2-2020 16:29:30: [E]     x = self.up3(torch.cat((x, s0), 1))
12-2-2020 16:29:30: [E]   File "D:\Apps\Flowframes\FlowframesData\pkgs\py\Lib\site-packages\torch\nn\modules\module.py", line 722, in _call_impl
12-2-2020 16:29:30: [E]     result = self.forward(*input, **kwargs)
12-2-2020 16:29:30: [E]   File "D:\Apps\Flowframes\FlowframesData\pkgs\py\Lib\site-packages\torch\nn\modules\container.py", line 117, in forward
12-2-2020 16:29:30: [E]     input = module(input)
12-2-2020 16:29:30: [E]   File "D:\Apps\Flowframes\FlowframesData\pkgs\py\Lib\site-packages\torch\nn\modules\module.py", line 722, in _call_impl
12-2-2020 16:29:30: [E]     result = self.forward(*input, **kwargs)
12-2-2020 16:29:30: [E]   File "D:\Apps\Flowframes\FlowframesData\pkgs\py\Lib\site-packages\torch\nn\modules\conv.py", line 905, in forward
12-2-2020 16:29:30: [E]     return F.conv_transpose2d(
12-2-2020 16:29:30: [E] RuntimeError: CUDA out of memory. Tried to allocate 60.00 MiB (GPU 0; 6.00 GiB total capacity; 348.98 MiB already allocated; 8.99 MiB free; 444.00 MiB reserved in total by PyTorch)

Cannot load model properly

Hello,
When I run
python inference_img.py --img demo/I0_0.png demo/I0_1.png --exp=4

I get this error

Traceback (most recent call last):
File "inference_img.py", line 18, in
model.load_model('./train_log', -1)
File "/workspace/interpolation/RIFE/model/RIFE_HD.py", line 179, in load_model
convert(torch.load('{}/flownet.pkl'.format(path), map_location=device)))
File "/opt/conda/envs/RIFE/lib/python3.6/site-packages/torch/nn/modules/module.py", line 1052, in load_state_dict
self.class.name, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for IFNet:
Missing key(s) in state_dict: "block3.conv0.0.weight", "block3.conv0.1.weight", "block3.conv0.1.bias", "block3.conv0.1.running_mean", "block3.conv0.1.running_var", "block3.conv0.2.weight",
...

Could you please tell me how to fix this?
(I followed the instructions on the github page and the youtube video)

what Environment variables need to be set (torch distributed)

Running train_WIP.py with default arg values excepts with error: some environment variables are not set.
original-error

Traceback (most recent call last):
  File "train_WIP.py", line 11, in <module>
    torch.distributed.init_process_group(backend="nccl", world_size=4)
  File "/home/aissy/Documents/ml/vision/rife/env/lib/python3.8/site-packages/torch/distributed/distributed_c10d.py", line 423, in init_process_group
    store, rank, world_size = next(rendezvous_iterator)
  File "/home/aissy/Documents/ml/vision/rife/env/lib/python3.8/site-packages/torch/distributed/rendezvous.py", line 166, in _env_rendezvous_handler
    raise _env_error("MASTER_ADDR")

The env variable RANK is not set. So I set an arbitary env var with os.environ["RANK"] = "1"; it complained that another variable "MASTER ADDR" is not set. I assume there's either a pre-setup needed for torch. distributed which is not set on my system.

怎样能补出DAIN那种特殊的丝滑感

同样的视频补4倍帧率，为什么感觉DAIN更丝滑一些
DAIN: https://drive.google.com/file/d/1_P4GmHiurc31pNenIMkfw2jd3LDn_7-L/view?usp=sharing
RIFE: https://drive.google.com/file/d/1MoUG4wfidarmuYIKlhbq4pFbsHSZNEHx/view?usp=sharing

There's a problem

这里执行程序总是提示这个https://drive.google.com/file/d/19wbvu4B0Td9y_mS-ECaWaApQ-Nwi4e6o/view?usp=sharing
在本地笔记本没有问题，有哪里需要修改的吗？？

What does --fps do?

In your colab example, you upsample a 25 fps video by 2x, which seems like it ought to produce 50fps, but then you encode it with --fps 60, and the output is in fact 60fps

does that mean every fifth frame is being repeated? or is it not actually 2x but rather some fraction above 2x

关于flow_gt于loss_dis

作者您好，我有点疑问。
在loss代码中，根据权重， loss_cons应该是论文中的loss_dis吧？
for i in range(3):
loss_cons += self.epe(flow_list[i], flow_gt[:, :2], 1)
loss_cons += self.epe(-flow_list[i], flow_gt[:, 2:4], 1)

定义是这样的，flow_list[i]于-flow_list[i]是代表0->1和1->0？
论文中的是0->t，和t->1？

请问有没有计划做成真实时播放器或者插件

Image sequence and input

     Thanks for adding the png output function. Can you make the output name to be consistent with ffmpeg ? i.e. 0000.png 0001.png ----- 7821.png.And then we can use ffmpeg to deal with image sequence.
     Adding image sequence input would also be great.

Memory leak

running inference_video.py makes the python interpreter eat memory untill system crash. 6 commits ago, it never used more than 1.4GB.

ONNX export

Hi, as we talked about on reddit ONNX export does not work, due to missing support of operator (grid_sampler) in the ONNX spec.
I see that it might be possible define custom ONNX operator on export, and the possibly do the same when importing into e.g. tensorflow. The missing operator would need custom implementation in the framework that the model is imported to, but this seems to already exist (atleast what I can find using google).

Better hosting for the models.

Would it be possible to provide better hosting for the models?
Currently, automating the downloads when a new version comes out is hard, even if technically possible.
I think you could use the git lfs feature for that, it gives 1 GB of storage and 1 GB of bandwith a month, altough maybe there are more suitable options.

Problems in inference_img.py

I use

$ python3 inference_img.py --img img0.png img1.png --times=4

, and I got this problem interpolate() got an unexpected keyword argument 'recompute_scale_factor' in IFNet.py line 95. Could you help me find out what happened, thanks

Better version declarations.

It would be good to provide the version number in some better way, e.g. commit tags. Currently, if I wanted to make a script to package RIFE automatically , I'd have to parse the first line of README.md for the version, but of course there's a big chance it will break in the future. There might be also better ways to do it than the tags, I don't use git often, I just found it after quick searching.

8x Interpolation Support for inference_video.py

Hello,

is it possible to enable 8x (exp 3) interpolation for video inference?

FPS Limit

Is there a limit for maximum FPS that can be created? Or it's a Flowframes app issue? It seems it can't go higher then 500 fps, saying "Invalid target frame rate".

Could RIFE solve the gap between different parts?

For my perspectives, Dain will output some ugly frames.
itch

Output videos are shorter than input videos

Videos created with commands such as python3 inference_video.py --exp=1 --video=video.mp4 are shorter than their source videos. How much shorter seems to vary significantly depending on the source video. I believe this to be due to dropped frames even though the --skip flag is not being used. ~~There is also the possibility of output video frames being out of order which may also affect timing~~ (my mistake that was part of the source video I used). I am working on providing examples for this.

This issue will cause desync between video and sound if it's added, as requested in #12

"inference_video.py" throws a segmentation fault after finishing interpolation

The outputted video file seems to be left finished in a viewable state. So effectively all this does is make inference_video.py terminate ungracefully after it's finished it's task. However this may be indicative of a greater issue.

Training - Animation

I'd like to start training a model for animation if you haven't start so already.

Any tips for getting started?
e.g. How should I prep the input data to feed to the train script? Any benefits to using a higher or lower learning rate? How much VRAM a 720p image needs vs 1080p?

Support HD videos

I found that the effect of the current model on small images is much better than on 1080p video. I plan to release a new model for large-resolution videos. Under the premise of the same processing speed, the results of 1080p and 2K video can be significantly improved (preliminary verification).

Support for AMD GPUs in the future?

As above?

about trained-model file in README.md

file RIFE_trained_model_v1.1.zip from pan.baidu.com which descripted in README.md is broken; but drive.google file RIFE_trained_model_new.zip is good enough.

Not first realtime

Hi,

In your paper, you write that “Our proposed RIFE is the first flow-based and real-time VFI algorithm that can process 720p videos at 30FPS.”

I believe this is incorrect; DIS (Dense Inverse Search), a VFI algorithm, was published in 2016 (https://github.com/tikroeger/OF_DIS), is flow-based, and on an RTX 2070 can run stable at 1080p at 60 fps. See https://nageru.sesse.net/ for my GPU implementation (scroll down to Futatabi).

Model v2 update log

We show some hard case results for every version model.
v2 google drive download link: (https://drive.google.com/file/d/1wsQIhHZ3Eg4_AfCXItFKqqyDMB4NS0Yd/view).

v1.1 2020.11.16 链接:https://pan.baidu.com/s/1SPRw_u3zjaufn7egMr19Eg 密码:orkd

training code

Thanks you for your nice work!!!
Do you have any plan for releasing training code?

Thanks in advance!

能否添加帧时间调整功能

类似于电视上的MEMC,DeJudder 强度调节.
插值帧更接近原始帧,造成电影感,避免Soap opera effect.
效果演示:
https://www.youtube.com/watch?v=3Ny49gW7fqU

Error with recompute_scale_factor=True

I've been trying to interpolate a 24fps mp4 video and this error gets thrown whenever I try to run either inference_video.py or inference_video_parallel.py.

C:\Python38\lib\site-packages\torch\nn\functional.py:3103: UserWarning: The default behavior for interpolate/upsample with float scale_factor changed in 1.6.0 to align with other frameworks/libraries, and now uses scale_factor directly, instead of relying on the computed output size. If you wish to restore the old behavior, please set recompute_scale_factor=True. See the documentation of nn.Upsample for details. warnings.warn("The default behavior for interpolate/upsample with float scale_factor changed "

When running the parallel version, I also experience about a 20% decrease in speed of interpolation. On my 1070ti with latest drivers and cuda 11.0, a 1080p video has about 3.6 fps of interpolation. With the parallel script that speed reduces to around 2.9-3 fps.

Counting frames takes forever on some videos

Before interpolation, RIFE seems to be running ffprobe -v error -count_frames -select_streams v:0 -show_entries stream=nb_read_frames -of default=nokey=1:noprint_wrappers=1 video.mp4 in order to get the number of frames in the input video. This command is slow on some types of files and can take tens of minutes to complete. Delaying the start of interpolation significantly.

Using ffmpeg, ffmpeg -i input.mkv -map 0:v:0 -c copy -f null - the counting of frames can be done within seconds instead of minutes. Though the output may take some parsing to isolate the frame count.

A discussion involving both commands:
https://stackoverflow.com/questions/2017843/fetch-frame-count-with-ffmpeg#28376817

A work-in-progress vulkan port :D

https://github.com/nihui/rife-ncnn-vulkan

Code problems in inference_mp4_2x.py

line #77 and line #79 should be deleted. Cuz mid0 and mid2 are not defined

有办法加入对timestep的支持么

I'm having trouble applying this.

I think I'm doing this incorrectly.

Upload my video.
1. My video is 30fps, but the original was 23.97fps. I can't change this.
2. I get a ton of Warning: Your video has 7556 static frames, it may change the duration of the generated video.
3. Input video is 4 minutes 54 seconds.
I run your command from your readme
1. I use the 4x version
  1. !python3 inference_mp4_4x.py --video myvideoname.mp4 --fps=120
  2. I change the fps to 120, because I expect 4x30fps
2. Output file is ... 41 seconds? I had expected 4:54, not 0:41
  1. Maybe ... this is about static frames?
3. It's the same video, but seemingly at random, most of the frames are dropped
  1. A little over 6 in 7 are missing
  2. It is 120fps though

replicating benchmarks

Thank you for sharing your code! I was trying to replicate the numbers you stated in your paper using this implementation but have unfortunately been unsuccessful so far. Would you be able to share a script that can be used to replicate the Vimeo-90k metrics you quoted? Also, I think the following padding has some issues.

https://github.com/hzwer/arXiv2020-RIFE/blob/3194107170d6613b2ea924aa35bb57e5913fff44/inference_img.py#L26-L28

https://github.com/hzwer/arXiv2020-RIFE/blob/3194107170d6613b2ea924aa35bb57e5913fff44/inference_img.py#L45

The pw - w and [:h, :w] indicate that pw > w (and ph > h). However, pw = 340 // 32 * 32 = 320 for w = 340 which violates this condition. Thanks for looking into this and thanks again for sharing your code!

Progress bar

Would be useful to have a progress bar!

Data augmentation bug on v1.2~v1.3

We found a data augmentation bug, which was more serious in v1.3, so we can't confirm the performance improvement of v1.3, and withdraw a version update. This bug leads to poor quantitative performance of the current model on the benchmark and is expected to be fixed in 3 days.

We think this is also a major reason for the poor performance of 2d animation.

内存需求怎么计算

首先感谢项目。
问题是：只有CPU没有GPU的机器，32g内存跑1080p视频8X，日志显示"Killed"退出，怀疑是内存不够。想要跑4K视频8X需要多少内存?

Interpolating lead to poor quality and jamming

https://pan.baidu.com/s/1m72KUBkUApodDDyrGZPhqQ 提取码: kivu

The 2x and 8x video is jamming in the 0:01..It's hard to say that the video after inserting frames has better MOS, but the speed has really improved a lot.

Not the fastest for multi-frame interpolation

Hi,

Thanks for open sourcing the code and contributing to the video frame interpolation community.

In the paper, it mentioned: "Coupled with the large complexity in the bi-directional flow estimation, none of these methods can achieve real-time speed"

I believe that might be inappropriate to say, as the recent published paper (https://www.ecva.net/papers/eccv_2020/papers_ECCV/papers/123720103.pdf) targets efficient multi-frame interpolation.

It utilizes bi-directional flow estimation as well, but it generates 7 frames for 0.12 second. where your method requires 0.036 * 7 = 0.252 seconds.

And the model from that paper is compact, which consists of only ~2M parameters, where your fast model has ~10M parameters.

Non-Windows - Multiprocessing for ~2x processing speed

I profiled the code and you can expect roughly another 2x processing speed increase if you create a multiprocessing script and split the inferences apart from the image writing.

Unfortunately I just found out the hard way you cannot pipe CUDA tensors on Windows, but Linux systems should be able to do this.

Assertion error:

I made it do that we can directly download and upscale videos from youtube, It worked for 360p videos but is not working for 720p.
any idea on how to solve?

Here is the error:

`myvideo.mp4, 2664.0 frames in total, 30.0FPS to 120.0FPS
33% 889/2664.0 [03:21<8:35:12, 17.42s/it]Traceback (most recent call last):
File "/usr/local/lib/python3.6/dist-packages/skvideo/io/ffmpeg.py", line 271, in _read_frame_data
assert len(arr) == framesize
AssertionError

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "inference_video.py", line 80, in
for frame in videogen:
File "/usr/local/lib/python3.6/dist-packages/skvideo/io/io.py", line 253, in vreader
for frame in reader.nextFrame():
File "/usr/local/lib/python3.6/dist-packages/skvideo/io/ffmpeg.py", line 297, in nextFrame
yield self._readFrame()
File "/usr/local/lib/python3.6/dist-packages/skvideo/io/ffmpeg.py", line 281, in _readFrame
s = self._read_frame_data()
File "/usr/local/lib/python3.6/dist-packages/skvideo/io/ffmpeg.py", line 275, in _read_frame_data
raise RuntimeError("%s" % (err1,))
RuntimeError
33% 889/2664.0 [03:21<06:42, 4.41it/s]`

Link to COLAB Notebook

Parallel processing for x2?

The parallel processing for the x4 option is great, would love to see this added to the x2 version as well.

Image output

Great work ! I wonder if you can add image sequence output function cuz I always want to use lossless encode.

Don't know what's wrong! :/

So I followed all the defined steps.

Git cloned the repo locally
Downloaded all the requirements
Created a folder inside repo called train_log and moved all the extracted *.pkl files inside
Try to apply frame interpolation on a video like python3 inference_video.py --exp=1 --video=video.mp4

I get this error:

Traceback (most recent call last):
  File "inference_video.py", line 81, in <module>
    lastframe = next(videogen)
  File "/Users/kabhinavaditya/.pyenv/versions/3.8.5/lib/python3.8/site-packages/skvideo/io/io.py", line 240, in vreader
    assert _HAS_FFMPEG, "Cannot find installation of ffmpeg."
AssertionError: Cannot find installation of ffmpeg.

Could you please help me?

Update requirements.txt

+moviepy

Outputted videos are "very slightly" shorter than input videos

This issue is similar to #23 but I believe a different bug is causing this.

Outputted videos are shorter by usually only a few seconds. If I rip the audio from the source video and give it to the new video, the beginning of the video is synced up with the audio but the video slowly gets more de-synced with the audio until it's end. This issue is more apparent with longer videos. This is due to frames being dropped very rarely.

Need proper sync for buffer

The buffer is not thread safe. It might result in generating broken frame. Better use queue for buffer.

https://github.com/hzwer/arXiv2020-RIFE/blob/da21de1c813ab1c91de091026706429f3cd89424/inference_video_parallel.py#L53
https://github.com/hzwer/arXiv2020-RIFE/blob/da21de1c813ab1c91de091026706429f3cd89424/inference_video_parallel.py#L125-L128

TypeError when trying to output PNG frames

I'm getting this error if I add the --png argument:

xception ignored in thread started by: <function clear_buffer at 0x00000158A426F310> Traceback (most recent call last): File "inference_video_parallel.py", line 83, in clear_buffer cv2.imwrite('output/{:0>7d}.png'.format(cnt), i) TypeError: unsupported format string passed to numpy.ndarray.__format__