Comments (7)
I should note that I have tried to process a video in batches of 30 frames successfully however this introduces inconsistences along with the number of batches.
from tokenflow.
Below is an example output of a video cut into 3 batches of 24 frames.
Note the inconsistencies or jumps between batches which are very noticeable.
Running the full 450 frames of this video results in CUDA out of memory.
tokenflow_PnP_fps_30_1.mp4
from tokenflow.
Can you show the results after preprocessing?
from tokenflow.
It's worth mentioning that I also tried processing the video using google collab with an A100 40GB and still ran out of memory.
I'm wondering if I am missing something when it comes to processing longer videos.
from tokenflow.
Hi there!
Just to make sure -- the inconsistencies in your result come from treating the video as 3 different videos when running our method. You shouldn't see such inconsistencies if you were run our method on the full video.
In terms of memory, the main bottleneck is the computation of extended attention on the keyframes, which is a massive matrix multiplication.
I think it can be lightened (at the expense of run time though) by adding more for loops instead of batch matrix multiplication in this computation : https://github.com/omerbt/TokenFlow/blob/06f51a0d0c19bef88f0b9b521146b5b849fbfb76/tokenflow_utils.py#L168C13-L168C16
It's currently written such that above 96 frames it loops over the frames to computes the cross-frame attention of each keyframe. Also it loops over the different attention heads. This was designed for our resources, and you can add a loop and loop over the dimension of the attention sequence_length.
For reference, I was able to run the method on 200+ frames using 48G gpu mem.
Hope this helps!
from tokenflow.
Hi there! Just to make sure -- the inconsistencies in your result come from treating the video as 3 different videos when running our method. You shouldn't see such inconsistencies if you were run our method on the full video.
In terms of memory, the main bottleneck is the computation of extended attention on the keyframes, which is a massive matrix multiplication. I think it can be lightened (at the expense of run time though) by adding more for loops instead of batch matrix multiplication in this computation : https://github.com/omerbt/TokenFlow/blob/06f51a0d0c19bef88f0b9b521146b5b849fbfb76/tokenflow_utils.py#L168C13-L168C16 It's currently written such that above 96 frames it loops over the frames to computes the cross-frame attention of each keyframe. Also it loops over the different attention heads. This was designed for our resources, and you can add a loop and loop over the dimension of the attention sequence_length.
For reference, I was able to run the method on 200+ frames using 48G gpu mem.
Hope this helps!
Thank you for such a detailed reply.
I am somewhat of a beginner but what you have mentioned makes sense.
You clarification is completely correct. This is not an issue with consistency of your method at all, more of inconsistencies introduced by my workaround to keep VRAM under control.
Unfortunately I do not have the skillset or know-how to make the updated for loops as you have suggested.
Just curious how you are able to have a card with 48gb memory?
I tried provisioning an a100 80gb from google but was denied as I'm not a business ahaha.
Big fan of your method and would love to put this into practice on higher resolution longer, videos.
from tokenflow.
Closing this as it has largely been answered.
from tokenflow.
Related Issues (20)
- Visualising diffusion features. HOT 2
- Image conditioning HOT 2
- Add missing requirements HOT 1
- Default data path in config_sdedit.yaml is to a missing file HOT 1
- Code snippet to reduce VRAM usage when too many frames to process.
- what is the correct way to run demo? HOT 2
- Hello, I'd like to ask, does every video require preprocessing before editing? HOT 1
- not compatible with diffusers 0.21+ [with workaround] HOT 2
- a problem about the code,thanks HOT 3
- Required GPU memory depends on the video length. HOT 1
- batching pivots allows processing bigger/longer sequences
- Random images if we use different SD version
- Adapt to use SDD-1B? HOT 1
- Script for Warp-error metric.
- confusions between reshape_heads_to_batch_dim and heads_to_batch_dim
- "ValueError: max() arg is an empty sequence" when trying to run via jupyterlab env HOT 1
- Missing License HOT 1
- SD XL Integration HOT 1
- Is it all code released? HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from tokenflow.