Just wondering if any limitations of the Infini-attention like inference speed and mod

Limitations of the method about infinitransformer HOT 2 OPEN

fyang064 commented on July 27, 2024

Limitations of the method

from infinitransformer.

Comments (2)

fyang064 commented on July 27, 2024 2

Since infini attention uses segmentation which is mainly focused on reducing memory usage and computational cost into O(N) so if you use very long seq such as seq len = 1M, then you have to wait 5min per batch on 1x H100 gpu.

In this implementation I used for loop to make it work so you'll get exactly N times more time to inference or to train.

e.g., segment 2048 & block 2048 ==> 1 segments, just 1 attention computation segment 2048 & block 32K ==> 16 segments, slower 16 times than block 2048

Model performance check was done at the paper with PPL but we may need more tests on very long sequence based tests.

I agree with you that more tests on very long sequence input are needed. It's not unique to think about using the segmentation to break down the long sequence input by for loop, e.g. RingAttention. Just curious about the impact of compressive memory, how does it influence the computation of attention when dealing with the long input?

from infinitransformer.

Beomi commented on July 27, 2024

Since infini attention uses segmentation which is mainly focused on reducing memory usage and computational cost into O(N) so if you use very long seq such as seq len = 1M, then you have to wait 5min per batch on 1x H100 gpu.

In this implementation I used for loop to make it work so you'll get exactly N times more time to inference or to train.

e.g.,
segment 2048 & block 2048 ==> 1 segments, just 1 attention computation
segment 2048 & block 32K ==> 16 segments, slower 16 times than block 2048

Model performance check was done at the paper with PPL but we may need more tests on very long sequence based tests.

from infinitransformer.

Recommend Projects

Limitations of the method about infinitransformer HOT 2 OPEN

Comments (2)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent