Comments (3)
Spliting kv cache into key cache and value cache is also important (https://github.com/keras-team/keras-nlp/blob/master/keras_nlp/models/gemma/gemma_attention.py#L166).
from keras-nlp.
@lingzhi98 thanks! We are planning some generation improvements so will definitely check this out. Agreed we can let performance be our guide. Probably particularly jax compiled performance.
Were you thinking of a specific backend/compiled with XLA/not compiled? What's motivating the suggestion?
from keras-nlp.
I use jax as keras backend. I have seen the concatenation become the main overhead if increasing batch size. Due to keep kv caches as one tensor, we need slice the kv cache to get corresponding key/value cache to compute attention output and then update cache. Dynamic update slice fusion will blocked by this slice op (https://github.com/openxla/xla/blob/main/xla/service/gpu/ir_emission_utils.cc#L472) and hurts performance again.
from keras-nlp.
Related Issues (20)
- Issue instantiating a keras_nlp.models.Backbone from a model preset of Hugging Face handles HOT 4
- How gemma_lm.preprocessor.sequence_length dealing with large input data HOT 3
- Any plans for Llama 3?
- Any plans for moreLlama 3?
- Any plans for more Llama type models? HOT 1
- Samplers in Gemma model HOT 6
- Retrieving Model Text in Custom Loss Function for Training
- Cannot reproduce results from notebook on Colab HOT 3
- GemmaBackbone.get_layout_map broken for gemma_2b_en HOT 2
- make it easier to adjust dropout when loading gemma models
- DebertaV3MaskedLM example don't work HOT 1
- 403 KaggleApiHTTPError while running GemmaCausalLM HOT 2
- Add support for `PaliGemma`
- Preprocessor does not respect sequence_length HOT 2
- unable to diagnose OOM HOT 2
- Cannot export a slightly customized XLMRoberta model from keras_nlp HOT 1
- Distributed training not working (batch size calculation) HOT 5
- Documented `id_to_token` doesn't exist for UnicodeCodepointTokenizer HOT 4
- Add support for returning the attention scores in encoder call. HOT 1
- Add remainig Phi-3 models HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from keras-nlp.