Giter VIP home page Giter VIP logo

Comments (6)

AIR-hl avatar AIR-hl commented on September 18, 2024

If you fine-tuning model with full parameters it usually does not cause problems. But if you fine-tuning model with peft method such as Lora, it may cause problems

from trl.

zyzhang1130 avatar zyzhang1130 commented on September 18, 2024

@AIR-hl I see. I can see why that is the case, as full-parameter tuning updates the embeddings of the new token in the input embedding layer. I have one closely related question: how is training model with chat template enabled different from using formatting_func and 'data_collator'? Conceptually I feel they aim to achieve the same goal, and the later is found easily in a lot of tutorials/code online. However, I feel the official huggingface documentation does not address their distinction explicitly. Is there something special that only using chat template can achieve?

Update: actually there might still be a problem. If setup_chat_format only adds additional special tokens for the beginning/end of a turn in a dialogue, this is fine. But the current implementation also places the original bos, eos tokens regardless of what model is used. I think this would render pre-training useless.

from trl.

deema-A avatar deema-A commented on September 18, 2024
RuntimeError: Error(s) in loading state_dict for PeftModelForCausalLM:
	size mismatch for base_model.model.model.embed_tokens.weight: copying a param with shape torch.Size([32002, 4096]) from checkpoint, the shape in current model is torch.Size([32000, 4096]).
	size mismatch for base_model.model.lm_head.weight: copying a param with shape torch.Size([32002, 4096]) from checkpoint, the shape in current model is torch.Size([32000, 4096]).

any idea?

from trl.

zyzhang1130 avatar zyzhang1130 commented on September 18, 2024
RuntimeError: Error(s) in loading state_dict for PeftModelForCausalLM:
	size mismatch for base_model.model.model.embed_tokens.weight: copying a param with shape torch.Size([32002, 4096]) from checkpoint, the shape in current model is torch.Size([32000, 4096]).
	size mismatch for base_model.model.lm_head.weight: copying a param with shape torch.Size([32002, 4096]) from checkpoint, the shape in current model is torch.Size([32000, 4096]).

any idea?

were you using lora to fine-tune your model?

from trl.

AIR-hl avatar AIR-hl commented on September 18, 2024

@AIR-hl I see. I can see why that is the case, as full-parameter tuning updates the embeddings of the new token in the input embedding layer. I have one closely related question: how is training model with chat template enabled different from using formatting_func and 'data_collator'? Conceptually I feel they aim to achieve the same goal, and the later is found easily in a lot of tutorials/code online. However, I feel the official huggingface documentation does not address their distinction explicitly. Is there something special that only using chat template can achieve?

Update: actually there might still be a problem. If setup_chat_format only adds additional special tokens for the beginning/end of a turn in a dialogue, this is fine. But the current implementation also places the original bos, eos tokens regardless of what model is used. I think this would render pre-training useless.

@zyzhang1130 In fact, the set_chat_format just provides a convenient way to format chat data in json, you can also customize the chat template based on the existing bos, eos tokens of the model. The above is just my understanding.

from trl.

deema-A avatar deema-A commented on September 18, 2024
RuntimeError: Error(s) in loading state_dict for PeftModelForCausalLM:
	size mismatch for base_model.model.model.embed_tokens.weight: copying a param with shape torch.Size([32002, 4096]) from checkpoint, the shape in current model is torch.Size([32000, 4096]).
	size mismatch for base_model.model.lm_head.weight: copying a param with shape torch.Size([32002, 4096]) from checkpoint, the shape in current model is torch.Size([32000, 4096]).

any idea?

were you using lora to fine-tune your model?

@zyzhang1130 yes

peft_config = LoraConfig(
        lora_alpha=16,
        lora_dropout=0.05,
        r=16,
        bias="none",
        task_type="CAUSAL_LM", 
        base_model_name_or_path=model_id,
        modules_to_save = ["lm_head", "embed_tokens"]
)

from trl.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.