Giter VIP home page Giter VIP logo

eccv2022-rife's Introduction

Hi there 👋

  • I used to be an algorithm contest player NOI🥈, ICPC-regional🏅️.

  • I worked at MEGVII Research From 2017 to 2023. Currently I work at StepFun. I received my B.S. degree from Peking Univerisity in 2020.

Main Projects:

Cooperation Projects:

Google Scholar, 知乎, 算法博客, Email, CV

Service: CVPR22-24/ECCV22-24/ICCV23/AAAI23/NeurIPS23/ICLR24/ICML24/WACV24/TIP/TPAMI/TOMM

eccv2022-rife's People

Contributors

a1600012888 avatar catscarlet avatar chappjo avatar christopher-kapic avatar dynmi avatar eonzenex avatar heylonnhp avatar hzwer avatar justin62628 avatar kkwik avatar ko1n avatar lazylion22 avatar mafiosnik777 avatar mskycoder avatar sadig102010 avatar sloganking avatar stonecypher avatar talosh avatar zzh-tech avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

eccv2022-rife's Issues

Add argument to keep sound ?

Sorry if I am overstepping the bounds of the project, but I have found that the interpolated output has no sounds. It would be awesome if you could add an argument (like --sounds) to keep the sound on the generated output -- or at least when output is the same length as input.

EDIT: Temporary solution:
ffmpeg -i "$video_name" audio.mp3 -y
ffmpeg -i "video_4x.mp4" -i "audio.mp3" -map 0:0 -map 1:0 -c:v copy -c:a copy "video_4x_audio.mp4" -y

Google Colab

Willing to set up a Google Colab notebook or a Docker setup?
interested in seeing results for 3DCG interpolation

CUDA out of memory though there supposed to be enough VRAM

n00mkrad/flowframes#2 (comment)

I got below message when running RIFE integrated in Flowframes.

CUDA out of memory. Tried to allocate 60.00 MiB (GPU 0; 6.00 GiB total capacity; 348.98 MiB already allocated; 8.99 MiB free; 444.00 MiB reserved in total by PyTorch)

Full stack is as below. No other VRAM-sensitive apps were running at the meanwhile.

12-2-2020 16:29:30: [E] Traceback (most recent call last):
12-2-2020 16:29:30: [E]   File "interp-parallel.py", line 138, in <module>
12-2-2020 16:29:30: [E]     inferences = make_inference(model, I0, I1, exp=args.times)
12-2-2020 16:29:30: [E]   File "interp-parallel.py", line 110, in make_inference
12-2-2020 16:29:30: [E]     middle = model.inference(I0, I1)
12-2-2020 16:29:30: [E]   File "D:\Apps\Flowframes\FlowframesData\pkgs\rife-cuda\model\RIFE.py", line 207, in inference
12-2-2020 16:29:30: [E]     return self.predict(imgs, flow, training=False).detach()
12-2-2020 16:29:30: [E]   File "D:\Apps\Flowframes\FlowframesData\pkgs\rife-cuda\model\RIFE.py", line 191, in predict
12-2-2020 16:29:30: [E]     refine_output, warped_img0, warped_img1, warped_img0_gt, warped_img1_gt = self.fusionnet(
12-2-2020 16:29:30: [E]   File "D:\Apps\Flowframes\FlowframesData\pkgs\py\Lib\site-packages\torch\nn\modules\module.py", line 722, in _call_impl
12-2-2020 16:29:30: [E]     result = self.forward(*input, **kwargs)
12-2-2020 16:29:30: [E]   File "D:\Apps\Flowframes\FlowframesData\pkgs\rife-cuda\model\RIFE.py", line 118, in forward
12-2-2020 16:29:30: [E]     x = self.up3(torch.cat((x, s0), 1))
12-2-2020 16:29:30: [E]   File "D:\Apps\Flowframes\FlowframesData\pkgs\py\Lib\site-packages\torch\nn\modules\module.py", line 722, in _call_impl
12-2-2020 16:29:30: [E]     result = self.forward(*input, **kwargs)
12-2-2020 16:29:30: [E]   File "D:\Apps\Flowframes\FlowframesData\pkgs\py\Lib\site-packages\torch\nn\modules\container.py", line 117, in forward
12-2-2020 16:29:30: [E]     input = module(input)
12-2-2020 16:29:30: [E]   File "D:\Apps\Flowframes\FlowframesData\pkgs\py\Lib\site-packages\torch\nn\modules\module.py", line 722, in _call_impl
12-2-2020 16:29:30: [E]     result = self.forward(*input, **kwargs)
12-2-2020 16:29:30: [E]   File "D:\Apps\Flowframes\FlowframesData\pkgs\py\Lib\site-packages\torch\nn\modules\conv.py", line 905, in forward
12-2-2020 16:29:30: [E]     return F.conv_transpose2d(
12-2-2020 16:29:30: [E] RuntimeError: CUDA out of memory. Tried to allocate 60.00 MiB (GPU 0; 6.00 GiB total capacity; 348.98 MiB already allocated; 8.99 MiB free; 444.00 MiB reserved in total by PyTorch)

Cannot load model properly

Hello,
When I run
python inference_img.py --img demo/I0_0.png demo/I0_1.png --exp=4

I get this error

Traceback (most recent call last):
File "inference_img.py", line 18, in
model.load_model('./train_log', -1)
File "/workspace/interpolation/RIFE/model/RIFE_HD.py", line 179, in load_model
convert(torch.load('{}/flownet.pkl'.format(path), map_location=device)))
File "/opt/conda/envs/RIFE/lib/python3.6/site-packages/torch/nn/modules/module.py", line 1052, in load_state_dict
self.class.name, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for IFNet:
Missing key(s) in state_dict: "block3.conv0.0.weight", "block3.conv0.1.weight", "block3.conv0.1.bias", "block3.conv0.1.running_mean", "block3.conv0.1.running_var", "block3.conv0.2.weight",
...

Could you please tell me how to fix this?
(I followed the instructions on the github page and the youtube video)

what Environment variables need to be set (torch distributed)

Running train_WIP.py with default arg values excepts with error: some environment variables are not set.
original-error

Traceback (most recent call last):
  File "train_WIP.py", line 11, in <module>
    torch.distributed.init_process_group(backend="nccl", world_size=4)
  File "/home/aissy/Documents/ml/vision/rife/env/lib/python3.8/site-packages/torch/distributed/distributed_c10d.py", line 423, in init_process_group
    store, rank, world_size = next(rendezvous_iterator)
  File "/home/aissy/Documents/ml/vision/rife/env/lib/python3.8/site-packages/torch/distributed/rendezvous.py", line 166, in _env_rendezvous_handler
    raise _env_error("MASTER_ADDR")

The env variable RANK is not set. So I set an arbitary env var with os.environ["RANK"] = "1"; it complained that another variable "MASTER ADDR" is not set. I assume there's either a pre-setup needed for torch. distributed which is not set on my system.

What does --fps do?

In your colab example, you upsample a 25 fps video by 2x, which seems like it ought to produce 50fps, but then you encode it with --fps 60, and the output is in fact 60fps

does that mean every fifth frame is being repeated? or is it not actually 2x but rather some fraction above 2x

关于flow_gt于loss_dis

作者您好,我有点疑问。
在loss代码中,根据权重, loss_cons应该是论文中的loss_dis吧?
for i in range(3):
loss_cons += self.epe(flow_list[i], flow_gt[:, :2], 1)
loss_cons += self.epe(-flow_list[i], flow_gt[:, 2:4], 1)
image
定义是这样的,flow_list[i]于-flow_list[i]是代表0->1和1->0?
论文中的是0->t,和t->1?

Image sequence and input

     Thanks for adding the png output function. Can you make the output name to be consistent with ffmpeg ? i.e. 0000.png 0001.png ----- 7821.png.And then we can use ffmpeg to deal with image sequence.
     Adding image sequence input would also be great.

Memory leak

running inference_video.py makes the python interpreter eat memory untill system crash. 6 commits ago, it never used more than 1.4GB.

ONNX export

Hi, as we talked about on reddit ONNX export does not work, due to missing support of operator (grid_sampler) in the ONNX spec.
I see that it might be possible define custom ONNX operator on export, and the possibly do the same when importing into e.g. tensorflow. The missing operator would need custom implementation in the framework that the model is imported to, but this seems to already exist (atleast what I can find using google).

Better hosting for the models.

Would it be possible to provide better hosting for the models?
Currently, automating the downloads when a new version comes out is hard, even if technically possible.
I think you could use the git lfs feature for that, it gives 1 GB of storage and 1 GB of bandwith a month, altough maybe there are more suitable options.

Problems in inference_img.py

I use

$ python3 inference_img.py --img img0.png img1.png --times=4

, and I got this problem interpolate() got an unexpected keyword argument 'recompute_scale_factor' in IFNet.py line 95. Could you help me find out what happened, thanks

Better version declarations.

It would be good to provide the version number in some better way, e.g. commit tags. Currently, if I wanted to make a script to package RIFE automatically , I'd have to parse the first line of README.md for the version, but of course there's a big chance it will break in the future. There might be also better ways to do it than the tags, I don't use git often, I just found it after quick searching.

FPS Limit

Is there a limit for maximum FPS that can be created? Or it's a Flowframes app issue? It seems it can't go higher then 500 fps, saying "Invalid target frame rate".

Output videos are shorter than input videos

Videos created with commands such as python3 inference_video.py --exp=1 --video=video.mp4 are shorter than their source videos. How much shorter seems to vary significantly depending on the source video. I believe this to be due to dropped frames even though the --skip flag is not being used. There is also the possibility of output video frames being out of order which may also affect timing (my mistake that was part of the source video I used). I am working on providing examples for this.

This issue will cause desync between video and sound if it's added, as requested in #12

Training - Animation

I'd like to start training a model for animation if you haven't start so already.

Any tips for getting started?
e.g. How should I prep the input data to feed to the train script? Any benefits to using a higher or lower learning rate? How much VRAM a 720p image needs vs 1080p?

Support HD videos

I found that the effect of the current model on small images is much better than on 1080p video. I plan to release a new model for large-resolution videos. Under the premise of the same processing speed, the results of 1080p and 2K video can be significantly improved (preliminary verification).

about trained-model file in README.md

file RIFE_trained_model_v1.1.zip from pan.baidu.com which descripted in README.md is broken; but drive.google file RIFE_trained_model_new.zip is good enough.

Not first realtime

Hi,

In your paper, you write that “Our proposed RIFE is the first flow-based and real-time VFI algorithm that can process 720p videos at 30FPS.”

I believe this is incorrect; DIS (Dense Inverse Search), a VFI algorithm, was published in 2016 (https://github.com/tikroeger/OF_DIS), is flow-based, and on an RTX 2070 can run stable at 1080p at 60 fps. See https://nageru.sesse.net/ for my GPU implementation (scroll down to Futatabi).

training code

Thanks you for your nice work!!!
Do you have any plan for releasing training code?

Thanks in advance!

Error with recompute_scale_factor=True

I've been trying to interpolate a 24fps mp4 video and this error gets thrown whenever I try to run either inference_video.py or inference_video_parallel.py.

C:\Python38\lib\site-packages\torch\nn\functional.py:3103: UserWarning: The default behavior for interpolate/upsample with float scale_factor changed in 1.6.0 to align with other frameworks/libraries, and now uses scale_factor directly, instead of relying on the computed output size. If you wish to restore the old behavior, please set recompute_scale_factor=True. See the documentation of nn.Upsample for details. warnings.warn("The default behavior for interpolate/upsample with float scale_factor changed "

When running the parallel version, I also experience about a 20% decrease in speed of interpolation. On my 1070ti with latest drivers and cuda 11.0, a 1080p video has about 3.6 fps of interpolation. With the parallel script that speed reduces to around 2.9-3 fps.

Counting frames takes forever on some videos

Before interpolation, RIFE seems to be running ffprobe -v error -count_frames -select_streams v:0 -show_entries stream=nb_read_frames -of default=nokey=1:noprint_wrappers=1 video.mp4 in order to get the number of frames in the input video. This command is slow on some types of files and can take tens of minutes to complete. Delaying the start of interpolation significantly.

Using ffmpeg, ffmpeg -i input.mkv -map 0:v:0 -c copy -f null - the counting of frames can be done within seconds instead of minutes. Though the output may take some parsing to isolate the frame count.

A discussion involving both commands:
https://stackoverflow.com/questions/2017843/fetch-frame-count-with-ffmpeg#28376817

I'm having trouble applying this.

I think I'm doing this incorrectly.

I:

  1. Upload my video.
    1. My video is 30fps, but the original was 23.97fps. I can't change this.
    2. I get a ton of Warning: Your video has 7556 static frames, it may change the duration of the generated video.
    3. Input video is 4 minutes 54 seconds.
  2. I run your command from your readme
    1. I use the 4x version
      1. !python3 inference_mp4_4x.py --video myvideoname.mp4 --fps=120
      2. I change the fps to 120, because I expect 4x30fps
    2. Output file is ... 41 seconds? I had expected 4:54, not 0:41
      1. Maybe ... this is about static frames?
    3. It's the same video, but seemingly at random, most of the frames are dropped
      1. A little over 6 in 7 are missing
      2. It is 120fps though

replicating benchmarks

Thank you for sharing your code! I was trying to replicate the numbers you stated in your paper using this implementation but have unfortunately been unsuccessful so far. Would you be able to share a script that can be used to replicate the Vimeo-90k metrics you quoted? Also, I think the following padding has some issues.

https://github.com/hzwer/arXiv2020-RIFE/blob/3194107170d6613b2ea924aa35bb57e5913fff44/inference_img.py#L26-L28

https://github.com/hzwer/arXiv2020-RIFE/blob/3194107170d6613b2ea924aa35bb57e5913fff44/inference_img.py#L45

The pw - w and [:h, :w] indicate that pw > w (and ph > h). However, pw = 340 // 32 * 32 = 320 for w = 340 which violates this condition. Thanks for looking into this and thanks again for sharing your code!

Data augmentation bug on v1.2~v1.3

We found a data augmentation bug, which was more serious in v1.3, so we can't confirm the performance improvement of v1.3, and withdraw a version update. This bug leads to poor quantitative performance of the current model on the benchmark and is expected to be fixed in 3 days.

We think this is also a major reason for the poor performance of 2d animation.

内存需求怎么计算

首先感谢项目。
问题是:只有CPU没有GPU的机器,32g内存跑1080p视频8X,日志显示"Killed"退出,怀疑是内存不够。想要跑4K视频8X需要多少内存?

Not the fastest for multi-frame interpolation

Hi,

Thanks for open sourcing the code and contributing to the video frame interpolation community.

In the paper, it mentioned: "Coupled with the large complexity in the bi-directional flow estimation, none of these methods can achieve real-time speed"

I believe that might be inappropriate to say, as the recent published paper (https://www.ecva.net/papers/eccv_2020/papers_ECCV/papers/123720103.pdf) targets efficient multi-frame interpolation.

It utilizes bi-directional flow estimation as well, but it generates 7 frames for 0.12 second. where your method requires 0.036 * 7 = 0.252 seconds.

And the model from that paper is compact, which consists of only ~2M parameters, where your fast model has ~10M parameters.

Non-Windows - Multiprocessing for ~2x processing speed

I profiled the code and you can expect roughly another 2x processing speed increase if you create a multiprocessing script and split the inferences apart from the image writing.

Unfortunately I just found out the hard way you cannot pipe CUDA tensors on Windows, but Linux systems should be able to do this.

Assertion error:

I made it do that we can directly download and upscale videos from youtube, It worked for 360p videos but is not working for 720p.
any idea on how to solve?

Here is the error:

`myvideo.mp4, 2664.0 frames in total, 30.0FPS to 120.0FPS
33% 889/2664.0 [03:21<8:35:12, 17.42s/it]Traceback (most recent call last):
File "/usr/local/lib/python3.6/dist-packages/skvideo/io/ffmpeg.py", line 271, in _read_frame_data
assert len(arr) == framesize
AssertionError

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "inference_video.py", line 80, in
for frame in videogen:
File "/usr/local/lib/python3.6/dist-packages/skvideo/io/io.py", line 253, in vreader
for frame in reader.nextFrame():
File "/usr/local/lib/python3.6/dist-packages/skvideo/io/ffmpeg.py", line 297, in nextFrame
yield self._readFrame()
File "/usr/local/lib/python3.6/dist-packages/skvideo/io/ffmpeg.py", line 281, in _readFrame
s = self._read_frame_data()
File "/usr/local/lib/python3.6/dist-packages/skvideo/io/ffmpeg.py", line 275, in _read_frame_data
raise RuntimeError("%s" % (err1,))
RuntimeError
33% 889/2664.0 [03:21<06:42, 4.41it/s]`

Link to COLAB Notebook

Parallel processing for x2?

The parallel processing for the x4 option is great, would love to see this added to the x2 version as well.

Image output

Great work ! I wonder if you can add image sequence output function cuz I always want to use lossless encode.

Don't know what's wrong! :/

So I followed all the defined steps.

  1. Git cloned the repo locally
  2. Downloaded all the requirements
  3. Created a folder inside repo called train_log and moved all the extracted *.pkl files inside
  4. Try to apply frame interpolation on a video like python3 inference_video.py --exp=1 --video=video.mp4

I get this error:

Traceback (most recent call last):
  File "inference_video.py", line 81, in <module>
    lastframe = next(videogen)
  File "/Users/kabhinavaditya/.pyenv/versions/3.8.5/lib/python3.8/site-packages/skvideo/io/io.py", line 240, in vreader
    assert _HAS_FFMPEG, "Cannot find installation of ffmpeg."
AssertionError: Cannot find installation of ffmpeg.

Could you please help me?

Outputted videos are "very slightly" shorter than input videos

This issue is similar to #23 but I believe a different bug is causing this.

Outputted videos are shorter by usually only a few seconds. If I rip the audio from the source video and give it to the new video, the beginning of the video is synced up with the audio but the video slowly gets more de-synced with the audio until it's end. This issue is more apparent with longer videos. This is due to frames being dropped very rarely.

TypeError when trying to output PNG frames

I'm getting this error if I add the --png argument:

xception ignored in thread started by: <function clear_buffer at 0x00000158A426F310> Traceback (most recent call last): File "inference_video_parallel.py", line 83, in clear_buffer cv2.imwrite('output/{:0>7d}.png'.format(cnt), i) TypeError: unsupported format string passed to numpy.ndarray.__format__

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.