Giter VIP home page Giter VIP logo

Comments (7)

DBarker774 avatar DBarker774 commented on May 25, 2024

I should note that I have tried to process a video in batches of 30 frames successfully however this introduces inconsistences along with the number of batches.

from tokenflow.

DBarker774 avatar DBarker774 commented on May 25, 2024

Below is an example output of a video cut into 3 batches of 24 frames.
Note the inconsistencies or jumps between batches which are very noticeable.

Running the full 450 frames of this video results in CUDA out of memory.

tokenflow_PnP_fps_30_1.mp4

from tokenflow.

rakesh-reddy95 avatar rakesh-reddy95 commented on May 25, 2024

Can you show the results after preprocessing?

from tokenflow.

DBarker774 avatar DBarker774 commented on May 25, 2024

It's worth mentioning that I also tried processing the video using google collab with an A100 40GB and still ran out of memory.

I'm wondering if I am missing something when it comes to processing longer videos.

from tokenflow.

MichalGeyer avatar MichalGeyer commented on May 25, 2024

Hi there!
Just to make sure -- the inconsistencies in your result come from treating the video as 3 different videos when running our method. You shouldn't see such inconsistencies if you were run our method on the full video.

In terms of memory, the main bottleneck is the computation of extended attention on the keyframes, which is a massive matrix multiplication.
I think it can be lightened (at the expense of run time though) by adding more for loops instead of batch matrix multiplication in this computation : https://github.com/omerbt/TokenFlow/blob/06f51a0d0c19bef88f0b9b521146b5b849fbfb76/tokenflow_utils.py#L168C13-L168C16
It's currently written such that above 96 frames it loops over the frames to computes the cross-frame attention of each keyframe. Also it loops over the different attention heads. This was designed for our resources, and you can add a loop and loop over the dimension of the attention sequence_length.

For reference, I was able to run the method on 200+ frames using 48G gpu mem.

Hope this helps!

from tokenflow.

DBarker774 avatar DBarker774 commented on May 25, 2024

Hi there! Just to make sure -- the inconsistencies in your result come from treating the video as 3 different videos when running our method. You shouldn't see such inconsistencies if you were run our method on the full video.

In terms of memory, the main bottleneck is the computation of extended attention on the keyframes, which is a massive matrix multiplication. I think it can be lightened (at the expense of run time though) by adding more for loops instead of batch matrix multiplication in this computation : https://github.com/omerbt/TokenFlow/blob/06f51a0d0c19bef88f0b9b521146b5b849fbfb76/tokenflow_utils.py#L168C13-L168C16 It's currently written such that above 96 frames it loops over the frames to computes the cross-frame attention of each keyframe. Also it loops over the different attention heads. This was designed for our resources, and you can add a loop and loop over the dimension of the attention sequence_length.

For reference, I was able to run the method on 200+ frames using 48G gpu mem.

Hope this helps!

Thank you for such a detailed reply.
I am somewhat of a beginner but what you have mentioned makes sense.

You clarification is completely correct. This is not an issue with consistency of your method at all, more of inconsistencies introduced by my workaround to keep VRAM under control.

Unfortunately I do not have the skillset or know-how to make the updated for loops as you have suggested.

Just curious how you are able to have a card with 48gb memory?

I tried provisioning an a100 80gb from google but was denied as I'm not a business ahaha.

Big fan of your method and would love to put this into practice on higher resolution longer, videos.

from tokenflow.

DBarker774 avatar DBarker774 commented on May 25, 2024

Closing this as it has largely been answered.

from tokenflow.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.