Comments (3)
Helllo, as of now and my latest tests, logits do not match exactly, but generations do.
I invite you to remove the nn.Conv1D
in favor of the non native on, which will produce the same logits.
Also inviting you to compare generation, not logits, as Llama3 logits do not match either due to distribution shifts from the ROPE embedding
from transformers.
TLDR; while we aime to have 1e-4 logit equivalence, for RecurrentGemma, the model was very much slow in torch, and removing the custom convolution in favor of the native one improved generation speed.
Inviting you to benchmark this as well 🤗
from transformers.
This tests suit was run against the original codebase:
from transformers.
Related Issues (20)
- [DOCS] - Model outputs of RecurrentGemmaCausalLM doesn't align with the documentation HOT 1
- [Batched Whisper] ValueError on input mel features HOT 3
- use_reentrant=False can't be set properly HOT 6
- Bug: InformerModel, decoder_input torch.cat size of tensor mismatch error otherwise HOT 7
- BitsNBytes 4 bit quantization error message typo and logical errors in error message handling HOT 3
- train_new_from_iterator does not properly modify the tokenizer's postprocessor's ids when using a Sequence postprocessor HOT 1
- recent version of Transformers seems to mess with forward/__call__. Breaks patching loss function HOT 8
- TypeError: 'list' object is not callable || Resume from checkpoint HOT 3
- Failed to import transformers.models.vit.feature_extraction_vit because of the following error (look up to see its traceback): No module named 'ml_dtypes._custom_floats' HOT 1
- TokenClassificationPipeline support is_split_into_words tokeniser parameter HOT 2
- Implement kv cache sparsity like H2O with attention score HOT 2
- BART generate with min_new_tokens exceeds maximum length HOT 4
- Convert Helsinki-NLP model to huggingface
- Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained HOT 3
- Grounding DINO missing custom kernels HOT 2
- For multiple GPUs: torch.cuda.empty_cache() stuck forever
- Issues occuring during parallel evaluation (using Trainer.evaluate)
- ValueError: You should supply an encoding or a list of encodings to this method that includes input_ids, but you provided [] HOT 4
- Can the BNB quantization process be on GPU? HOT 2
- no_speech_probablity HOT 5
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from transformers.