Comments (2)
Since infini attention uses segmentation which is mainly focused on reducing memory usage and computational cost into O(N) so if you use very long seq such as seq len = 1M, then you have to wait 5min per batch on 1x H100 gpu.
In this implementation I used for loop to make it work so you'll get exactly N times more time to inference or to train.
e.g., segment 2048 & block 2048 ==> 1 segments, just 1 attention computation segment 2048 & block 32K ==> 16 segments, slower 16 times than block 2048
Model performance check was done at the paper with PPL but we may need more tests on very long sequence based tests.
I agree with you that more tests on very long sequence input are needed. It's not unique to think about using the segmentation to break down the long sequence input by for loop, e.g. RingAttention. Just curious about the impact of compressive memory, how does it influence the computation of attention when dealing with the long input?
from infinitransformer.
Since infini attention uses segmentation which is mainly focused on reducing memory usage and computational cost into O(N) so if you use very long seq such as seq len = 1M, then you have to wait 5min per batch on 1x H100 gpu.
In this implementation I used for loop to make it work so you'll get exactly N times more time to inference or to train.
e.g.,
segment 2048 & block 2048 ==> 1 segments, just 1 attention computation
segment 2048 & block 32K ==> 16 segments, slower 16 times than block 2048
Model performance check was done at the paper with PPL but we may need more tests on very long sequence based tests.
from infinitransformer.
Related Issues (20)
- Discord server for this?
- Code not running on GPU HOT 6
- config no attn_implementation = "eager" HOT 4
- question about norm_term_broadcastable HOT 5
- load model failed HOT 4
- Suggest to use the constant memory gradient computation in Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention
- Model generating random sequence HOT 8
- Memory should be per layer
- Memory does not use PE
- Inference code (with Segments)
- Are there any trained InfinityTransformer weights available?
- Segment and block size error HOT 1
- mem and norm_term is nanοΌ HOT 15
- What is the min GPU memory required to fine-tune the model?
- About memory missing location information HOT 5
- BitLinear
- Model loses information very quickly HOT 2
- Issue while runing test_train.small.gemma.infini.py HOT 2
- Support Zero-3? HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
π Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. πππ
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google β€οΈ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from infinitransformer.