Comments (10)
Problem you running into is that tokenizer for base model is incorrect and contains <end_of_utterance> token(prbably it's exactly the same as chat model), but base model's embedding layer doesn't have it. So if you reuse dataset/collator code for finetuning chat model and use processor.apply_chat_template function, it'll add non existing token_id and model's embedding layer will freak-out.
import torch
from transformers import AutoProcessor, Idefics2ForConditionalGeneration
processor_base = AutoProcessor.from_pretrained(
"HuggingFaceM4/idefics2-8b-base"
)
processor_chat = AutoProcessor.from_pretrained(
"HuggingFaceM4/idefics2-8b"
)
base = Idefics2ForConditionalGeneration.from_pretrained(
"HuggingFaceM4/idefics2-8b-base",
torch_dtype=torch.float16
)
chat = Idefics2ForConditionalGeneration.from_pretrained(
"HuggingFaceM4/idefics2-8b",
torch_dtype=torch.float16
)
print("Tokenizer chat max token:", max(processor_chat.tokenizer.get_vocab().values()))
print("Tokenizer base max token:", max(processor_base.tokenizer.get_vocab().values()))
print("chat embedding:", chat.base_model.get_submodule('text_model').get_submodule('embed_tokens'))
print("base embedding:", base.base_model.get_submodule('text_model').get_submodule('embed_tokens'))
print("last token:", processor_chat.tokenizer.convert_ids_to_tokens(max(processor_chat.tokenizer.get_vocab().values())))
Tokenizer chat max token: 32002
Tokenizer base max token: 32002
chat embedding: Embedding(32003, 4096, padding_idx=0)
base embedding: Embedding(32002, 4096, padding_idx=0)
last token: <end_of_utterance>
from transformers.
@jjkjkj That's a good find. So, for now I just removed the token and it seem to be working
text = processor.apply_chat_template(messages, add_generation_prompt=False)
if "base" in args.model_name: # hack to remove the end of utterance token
text = text.replace("<end_of_utterance>", "")
from transformers.
@VictorSanh No need to dig! Issue was found and explained by @jjkjkj here. It was to do with the presence of the <end_of_utterance>
token for the base model.
In fact, we can now close this issue :)
from transformers.
I have the same issue.
from transformers.
Hi @rabiulcste @BiliBraker thanks for reporting!
cc @VictorSanh In case you have an immediate idea why this is happening?
from transformers.
I wanted to mention another issue in the same script. While lora
is set to True, I get this error:
Traceback (most recent call last):
File "/apps/arch/distro/python/3.8/lib/python3.8/runpy.py", line 193, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/apps/arch/distro/python/3.8/lib/python3.8/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "synth-diffuse/evals/idefics2_fine_tuning.py", line 299, in <module>
main(args)
File "synth-diffuse/evals/idefics2_fine_tuning.py", line 103, in main
model.add_adapter(lora_config)
File /lib/python3.8/site-packages/transformers/integrations/peft.py", line 264, in add_adapter
inject_adapter_in_model(adapter_config, self, adapter_name)
File "/lib/python3.8/site-packages/peft/mapping.py", line 166, in inject_adapter_in_model
peft_model = tuner_cls(model, peft_config, adapter_name=adapter_name)
File "/lib/python3.8/site-packages/peft/tuners/lora/model.py", line 136, in __init__
super().__init__(model, config, adapter_name)
File "/lib/python3.8/site-packages/peft/tuners/tuners_utils.py", line 148, in __init__
self.inject_adapter(self.model, adapter_name)
File "lib/python3.8/site-packages/peft/tuners/tuners_utils.py", line 325, in inject_adapter
self._create_and_replace(peft_config, adapter_name, target, target_name, parent, current_key=key)
File "/lib/python3.8/site-packages/peft/tuners/lora/model.py", line 220, in _create_and_replace
new_module = self._create_new_module(lora_config, adapter_name, target, **kwargs)
File "/lib/python3.8/site-packages/peft/tuners/lora/model.py", line 295, in _create_new_module
new_module = dispatcher(target, adapter_name, lora_config=lora_config, **kwargs)
File "/lib/python3.8/site-packages/peft/tuners/lora/layer.py", line 1056, in dispatch_default
new_module = Linear(target, adapter_name, **kwargs)
File "/lib/python3.8/site-packages/peft/tuners/lora/layer.py", line 356, in __init__
self.update_layer(
File "/s/lib/python3.8/site-packages/peft/tuners/lora/layer.py", line 126, in update_layer
self.dora_init(adapter_name)
File "/lib/python3.8/site-packages/peft/tuners/lora/layer.py", line 191, in dora_init
lora_weight = lora_B.weight @ lora_A.weight
RuntimeError: "addmm_impl_cpu_" not implemented for 'Half'
It doesn't occur while QLora is set to True.
from transformers.
@rabiulcste Can you open a new issue with this info? This helps us keep better track of what has and hasn't been resolved as well as finding similar issues
from transformers.
cc @VictorSanh In case you have an immediate idea why this is happening?
Does not ring a bell unfortunately :/ need to focus on idefics2 2nd release wave but will for sure allocate time to dig in this week if it's not solved by then
from transformers.
@rabiulcste Can you open a new issue with this info? This helps us keep better track of what has and hasn't been resolved as well as finding similar issues
Sure, I'll create a new issue then. I have a couple more issues though :) Is it suggested to create a separate issue for each?
from transformers.
@rabiulcste Yes please, as long as they're independent.
from transformers.
Related Issues (20)
- Error with tf-keras when trying to geneate random seeds HOT 1
- Error while runing T5 trainer: TypeError: argument 'ids': 'list' object cannot be interpreted as an integer HOT 2
- Is `model. generate` supported during the training process?
- CLIPProcessor is not loading the saved Processor of the same version HOT 12
- Failed to Download GPT2-large Model from Hub
- Add TableTransformerImageProcessor HOT 3
- error when convert llama1 ckpts to hf formath HOT 5
- `hub_strategy="every_save"` won't push the model to the Hub if large
- Support for Multiple Datasets and Domain-Specific Loss Calculation in Trainer HOT 2
- AttributeError: 'HQQLinear' object has no attribute 'weight' HOT 8
- Assisted model doesn't seem to be working for Meta-Llama-3-8B HOT 2
- Mixtral past_key_values and output_router_logits incompatible HOT 1
- Disable Progress Bar? HOT 1
- Meet problems when I use the file src/transformers/models/llama/convert_llama_weights_to_hf.py to transfer LlaMa-7B HOT 2
- [DOCS] - Model outputs of RecurrentGemmaCausalLM doesn't align with the documentation HOT 1
- [Batched Whisper] ValueError on input mel features HOT 3
- use_reentrant=False can't be set properly HOT 6
- Bug: InformerModel, decoder_input torch.cat size of tensor mismatch error otherwise HOT 7
- BitsNBytes 4 bit quantization error message typo and logical errors in error message handling HOT 3
- train_new_from_iterator does not properly modify the tokenizer's postprocessor's ids when using a Sequence postprocessor
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from transformers.