because of <a href="https://github.com/huggingface/trl/blob/2860ce5091e689bab167454453

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

<div class="snippet-clipboard-content notranslate position-relative overflow-auto" data-snippet-clip

<div class="snippet-clipboard-content notranslate position-relative overflow-auto" data

<a class="user-mention notranslate" data-hovercard-type="user" data-hover

<div class="snippet-clipboard-content notranslate position-relative overfl

Regarding `setup_chat_format` overwriting existing special tokens about trl HOT 6 OPEN

zyzhang1130 commented on September 18, 2024

Regarding `setup_chat_format` overwriting existing special tokens

from trl.

Comments (6)

AIR-hl commented on September 18, 2024

If you fine-tuning model with full parameters it usually does not cause problems. But if you fine-tuning model with peft method such as Lora, it may cause problems

from trl.

zyzhang1130 commented on September 18, 2024

@AIR-hl I see. I can see why that is the case, as full-parameter tuning updates the embeddings of the new token in the input embedding layer. I have one closely related question: how is training model with chat template enabled different from using formatting_func and 'data_collator'? Conceptually I feel they aim to achieve the same goal, and the later is found easily in a lot of tutorials/code online. However, I feel the official huggingface documentation does not address their distinction explicitly. Is there something special that only using chat template can achieve?

Update: actually there might still be a problem. If setup_chat_format only adds additional special tokens for the beginning/end of a turn in a dialogue, this is fine. But the current implementation also places the original bos, eos tokens regardless of what model is used. I think this would render pre-training useless.

from trl.

deema-A commented on September 18, 2024

RuntimeError: Error(s) in loading state_dict for PeftModelForCausalLM:
	size mismatch for base_model.model.model.embed_tokens.weight: copying a param with shape torch.Size([32002, 4096]) from checkpoint, the shape in current model is torch.Size([32000, 4096]).
	size mismatch for base_model.model.lm_head.weight: copying a param with shape torch.Size([32002, 4096]) from checkpoint, the shape in current model is torch.Size([32000, 4096]).

any idea?

from trl.

zyzhang1130 commented on September 18, 2024

RuntimeError: Error(s) in loading state_dict for PeftModelForCausalLM:
	size mismatch for base_model.model.model.embed_tokens.weight: copying a param with shape torch.Size([32002, 4096]) from checkpoint, the shape in current model is torch.Size([32000, 4096]).
	size mismatch for base_model.model.lm_head.weight: copying a param with shape torch.Size([32002, 4096]) from checkpoint, the shape in current model is torch.Size([32000, 4096]).

any idea?

were you using lora to fine-tune your model?

from trl.

AIR-hl commented on September 18, 2024

@AIR-hl I see. I can see why that is the case, as full-parameter tuning updates the embeddings of the new token in the input embedding layer. I have one closely related question: how is training model with chat template enabled different from using formatting_func and 'data_collator'? Conceptually I feel they aim to achieve the same goal, and the later is found easily in a lot of tutorials/code online. However, I feel the official huggingface documentation does not address their distinction explicitly. Is there something special that only using chat template can achieve?

Update: actually there might still be a problem. If setup_chat_format only adds additional special tokens for the beginning/end of a turn in a dialogue, this is fine. But the current implementation also places the original bos, eos tokens regardless of what model is used. I think this would render pre-training useless.

@zyzhang1130 In fact, the set_chat_format just provides a convenient way to format chat data in json, you can also customize the chat template based on the existing bos, eos tokens of the model. The above is just my understanding.

from trl.

deema-A commented on September 18, 2024

RuntimeError: Error(s) in loading state_dict for PeftModelForCausalLM:
	size mismatch for base_model.model.model.embed_tokens.weight: copying a param with shape torch.Size([32002, 4096]) from checkpoint, the shape in current model is torch.Size([32000, 4096]).
	size mismatch for base_model.model.lm_head.weight: copying a param with shape torch.Size([32002, 4096]) from checkpoint, the shape in current model is torch.Size([32000, 4096]).

any idea?

were you using lora to fine-tune your model?

@zyzhang1130 yes

peft_config = LoraConfig(
        lora_alpha=16,
        lora_dropout=0.05,
        r=16,
        bias="none",
        task_type="CAUSAL_LM", 
        base_model_name_or_path=model_id,
        modules_to_save = ["lm_head", "embed_tokens"]
)

from trl.

Regarding `setup_chat_format` overwriting existing special tokens about trl HOT 6 OPEN

Comments (6)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent