bigcode-project / starcoder2 Goto Github PK

View Code? Open in Web Editor NEW

1.5K 1.5K 146.0 46 KB

Home of StarCoder2!

License: Apache License 2.0

Python 100.00%

starcoder2's People

Contributors

Stargazers

Watchers

Forkers

josephrp userbox020 imacoduh jwwestcott techthiyanes zhangrz5 jwski du406470427 bbsyinnv1131 aapostoliadis yaominglei yjercou owhvayifuqq webboyvc ccc0168 carlosouza dragon28 patrickbdevaney daikeren tehmasta malnutrition jmoork kufeng76 hosuappa wenhuilu gaofengliu lihuibng isuyu tsayan sweifan karthikra yipianyun2024 eljahdiosmio lalomorales22 rexterity ioio7896 evdcush synthwave-systems farukaydogan genostack yellowbee686 muhtasham daizelai rightpop-centhart bloggeno14 messagesta-r k-oprokets buffar-m wanglaosan00 john-rice farerthebesthulkferdy exelliumphilodian scopency93 boardinstaceyvitalbee leebufan peachninja-shadesdogg yanxg entreseestroonshulk beachroon-r f901107 gsqycx vebrevent bingmo33 louud19 gfluentie htinkerchic 87cephagne bagotoxic-y kmewgnome 78stamaha ailmendra17stictime opissroo-glasedip sletsuchem guychrome18 bpodhani championderp73 hopwator-y gament-y racergigaeyeecht eventsserene-sushydr readerenes46 nicsyscalamarket billite-jiggyough mahopenta-mercyfeline tatraflex-t seattand36 wotifra-cookyri sporkseneta dragonusangelife s-briefingpeak likemedia31sevarica zzxxcc7412 willie542 apollohuang1 rickyhong phymucs saitaotechnology alverazcartin477 ddkwork winning1120xx

starcoder2's Issues

Can starcoder2 be trained with a different language like TCL or lisp?

Hello @loubnabnl is it possible to get starcoder2 to learn TCL?

It was not part of the 30 languages so was curious if it's worth pursuing with SFT?

Also, is there FIM script you used for this version of starcoder2?

Unlawful use of my code

The readme of this repo reads the following:

StarCoder2 is a family of code generation models (3B, 7B, and 15B), trained on 600+ programming languages from The Stack v2 [...]

The dataset linked contains my code, without following its license (or lack thereof).

Consent is not opt-out. You trained an LLM on code you are not allowed to use.

format for inference in code completion

starcoder's format for inference in code completion is PSM, <fim_prefix> + prefix + <fim_suffix> + suffix + <fim_middle>

what's that for starcoder2?

from the paper, we could only see that

What does "unique tokens" mean (in the paper) ?

For example, on page 16 it said "This leads to a dataset of 622B+ unique tokens. For the 7B, we include OpenWebMath, Wikipedia, and Arxiv, leading to a slightly larger dataset of 658B+ unique tokens. For the 15B, we include the-stack-v2-train-full dataset and all extra data sources listed in §2, resulting in a dataset with 913B+ unique tokens. The size of this dataset is 4× the size of the training dataset for StarCoderBase."
The question is, does the "unique tokens" mean there are such a number of tokens totally in the dataset after dedup~ or if you use the starcoderv2's tokenizer to tokenize the whole dataset, you can get such a huge vocab dict?

Official Support for GGUF Quantization in BigCode Starcoder2 to Enhance Accessibility and Efficiency

Dear BigCode team, what a wonderful project!

I am writing this feature request for official implementation of GGUF quantization for Starcoder2 to enhance its adoption with coding platforms and APIs such as Ollama and LMStudio.

Despite the model's advanced capabilities with its versions, its integration and usability in the OpenAI-API style coding ecosystem, including extensions like "Continue" for VSCode, could be significantly improved. The current lack of support for GGUF quantization limits its potential reach and utility.

An official implementation by your team would ensure optimal performance and compatibility, eliminating the need for community-driven workarounds. I urge you to consider this proposal as a step towards making BigCode Starcoder2 a more versatile and inclusive tool for the developer community. Official GGUF quantization could significantly impact its adoption and effectiveness across diverse development environments.

Thank you for your time and consideration of this important enhancement. I look forward to your positive response and the future success of BigCode Starcoder2.

Better inference based on starcode2-3b model

I am new to starcode.

when I run the follow demo:

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

checkpoint = "./starcoder2-3b"
tokenizer = AutoTokenizer.from_pretrained(checkpoint)

model = AutoModelForCausalLM.from_pretrained(checkpoint, device_map="auto", torch_dtype=torch.bfloat16)

inputs = tokenizer.encode("def is_prime(n):", return_tensors="pt").to("cuda")
outputs = model.generate(inputs)
print(tokenizer.decode(outputs[0]))

it returns:

def is_prime():
    """
    This function checks if a number is prime or not.
    """

it doesn`t finish. so I SET the max_length=120, then it returns:

def is_prime():
    """
    This function checks if a number is prime or not.
    """
    num = int(input("Enter a number: "))
    if num > 1:
        for i in range(2, num):
            if (num % i) == 0:
                print(num, "is not a prime number")
                break
        else:
            print(num, "is a prime number")
    else:
        print(num, "is not a prime number")


is_prime()
<file_sep>/README.md
# Python-

The part

is_prime()
<file_sep>/README.md
# Python-

is redundant. now my solution is:

generated_code = tokenizer.decode(outputs[0])
if "<file_sep>" in generated_code:
    generated_code = generated_code.split("<file_sep>")[0]
print(generated_code)

But I don`t think it a good idea. I want the model to return the results in one go without generating redundant parts. How can I do that? Could you give me some advice?

what is the sft template？

what is the sft template？
when i try to use this model, i dont know what is the sft template.
please help me, thankyou

Some weights of the model checkpoint at `finetune_starcoder2/final_checkpoint were not used when initializing Starcoder2ForCausalLM

I get the following error after finetuning this model on the R dataset following the example in the README.

Some weights of the model checkpoint at finetune_starcoder2/final_checkpoint were not used when initializing Starcoder2ForCausalLM: ['model.layers.0.self_attn.k_proj.base_layer.bias', 'model.layers.0.self_attn.k_proj.base_layer.weight', 'model.layers.0.self_attn.k_proj.base_layer.weight.absmax', 'model.layers.0.self_attn.k_proj.base_layer.weight.quant_map', 'model.layers.0.self_attn.k_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.0.self_attn.k_proj.lora_A.default.weight', 'model.layers.0.self_attn.k_proj.lora_B.default.weight', 'model.layers.0.self_attn.o_proj.base_layer.bias', 'model.layers.0.self_attn.o_proj.base_layer.weight', 'model.layers.0.self_attn.o_proj.base_layer.weight.absmax', 'model.layers.0.self_attn.o_proj.base_layer.weight.quant_map', 'model.layers.0.self_attn.o_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.0.self_attn.o_proj.lora_A.default.weight', 'model.layers.0.self_attn.o_proj.lora_B.default.weight', 'model.layers.0.self_attn.q_proj.base_layer.bias', 'model.layers.0.self_attn.q_proj.base_layer.weight', 'model.layers.0.self_attn.q_proj.base_layer.weight.absmax', 'model.layers.0.self_attn.q_proj.base_layer.weight.quant_map', 'model.layers.0.self_attn.q_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.0.self_attn.q_proj.lora_A.default.weight', 'model.layers.0.self_attn.q_proj.lora_B.default.weight', 'model.layers.0.self_attn.v_proj.base_layer.bias', 'model.layers.0.self_attn.v_proj.base_layer.weight', 'model.layers.0.self_attn.v_proj.base_layer.weight.absmax', 'model.layers.0.self_attn.v_proj.base_layer.weight.quant_map', 'model.layers.0.self_attn.v_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.0.self_attn.v_proj.lora_A.default.weight', 'model.layers.0.self_attn.v_proj.lora_B.default.weight', 'model.layers.1.self_attn.k_proj.base_layer.bias', 'model.layers.1.self_attn.k_proj.base_layer.weight', 'model.layers.1.self_attn.k_proj.base_layer.weight.absmax', 'model.layers.1.self_attn.k_proj.base_layer.weight.quant_map', 'model.layers.1.self_attn.k_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.1.self_attn.k_proj.lora_A.default.weight', 'model.layers.1.self_attn.k_proj.lora_B.default.weight', 'model.layers.1.self_attn.o_proj.base_layer.bias', 'model.layers.1.self_attn.o_proj.base_layer.weight', 'model.layers.1.self_attn.o_proj.base_layer.weight.absmax', 'model.layers.1.self_attn.o_proj.base_layer.weight.quant_map', 'model.layers.1.self_attn.o_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.1.self_attn.o_proj.lora_A.default.weight', 'model.layers.1.self_attn.o_proj.lora_B.default.weight', 'model.layers.1.self_attn.q_proj.base_layer.bias', 'model.layers.1.self_attn.q_proj.base_layer.weight', 'model.layers.1.self_attn.q_proj.base_layer.weight.absmax', 'model.layers.1.self_attn.q_proj.base_layer.weight.quant_map', 'model.layers.1.self_attn.q_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.1.self_attn.q_proj.lora_A.default.weight', 'model.layers.1.self_attn.q_proj.lora_B.default.weight', 'model.layers.1.self_attn.v_proj.base_layer.bias', 'model.layers.1.self_attn.v_proj.base_layer.weight', 'model.layers.1.self_attn.v_proj.base_layer.weight.absmax', 'model.layers.1.self_attn.v_proj.base_layer.weight.quant_map', 'model.layers.1.self_attn.v_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.1.self_attn.v_proj.lora_A.default.weight', 'model.layers.1.self_attn.v_proj.lora_B.default.weight', 'model.layers.10.self_attn.k_proj.base_layer.bias', 'model.layers.10.self_attn.k_proj.base_layer.weight', 'model.layers.10.self_attn.k_proj.base_layer.weight.absmax', 'model.layers.10.self_attn.k_proj.base_layer.weight.quant_map', 'model.layers.10.self_attn.k_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.10.self_attn.k_proj.lora_A.default.weight', 'model.layers.10.self_attn.k_proj.lora_B.default.weight', 'model.layers.10.self_attn.o_proj.base_layer.bias', 'model.layers.10.self_attn.o_proj.base_layer.weight', 'model.layers.10.self_attn.o_proj.base_layer.weight.absmax', 'model.layers.10.self_attn.o_proj.base_layer.weight.quant_map', 'model.layers.10.self_attn.o_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.10.self_attn.o_proj.lora_A.default.weight', 'model.layers.10.self_attn.o_proj.lora_B.default.weight', 'model.layers.10.self_attn.q_proj.base_layer.bias', 'model.layers.10.self_attn.q_proj.base_layer.weight', 'model.layers.10.self_attn.q_proj.base_layer.weight.absmax', 'model.layers.10.self_attn.q_proj.base_layer.weight.quant_map', 'model.layers.10.self_attn.q_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.10.self_attn.q_proj.lora_A.default.weight', 'model.layers.10.self_attn.q_proj.lora_B.default.weight', 'model.layers.10.self_attn.v_proj.base_layer.bias', 'model.layers.10.self_attn.v_proj.base_layer.weight', 'model.layers.10.self_attn.v_proj.base_layer.weight.absmax', 'model.layers.10.self_attn.v_proj.base_layer.weight.quant_map', 'model.layers.10.self_attn.v_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.10.self_attn.v_proj.lora_A.default.weight', 'model.layers.10.self_attn.v_proj.lora_B.default.weight', 'model.layers.11.self_attn.k_proj.base_layer.bias', 'model.layers.11.self_attn.k_proj.base_layer.weight', 'model.layers.11.self_attn.k_proj.base_layer.weight.absmax', 'model.layers.11.self_attn.k_proj.base_layer.weight.quant_map', 'model.layers.11.self_attn.k_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.11.self_attn.k_proj.lora_A.default.weight', 'model.layers.11.self_attn.k_proj.lora_B.default.weight', 'model.layers.11.self_attn.o_proj.base_layer.bias', 'model.layers.11.self_attn.o_proj.base_layer.weight', 'model.layers.11.self_attn.o_proj.base_layer.weight.absmax', 'model.layers.11.self_attn.o_proj.base_layer.weight.quant_map', 'model.layers.11.self_attn.o_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.11.self_attn.o_proj.lora_A.default.weight', 'model.layers.11.self_attn.o_proj.lora_B.default.weight', 'model.layers.11.self_attn.q_proj.base_layer.bias', 'model.layers.11.self_attn.q_proj.base_layer.weight', 'model.layers.11.self_attn.q_proj.base_layer.weight.absmax', 'model.layers.11.self_attn.q_proj.base_layer.weight.quant_map', 'model.layers.11.self_attn.q_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.11.self_attn.q_proj.lora_A.default.weight', 'model.layers.11.self_attn.q_proj.lora_B.default.weight', 'model.layers.11.self_attn.v_proj.base_layer.bias', 'model.layers.11.self_attn.v_proj.base_layer.weight', 'model.layers.11.self_attn.v_proj.base_layer.weight.absmax', 'model.layers.11.self_attn.v_proj.base_layer.weight.quant_map', 'model.layers.11.self_attn.v_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.11.self_attn.v_proj.lora_A.default.weight', 'model.layers.11.self_attn.v_proj.lora_B.default.weight', 'model.layers.12.self_attn.k_proj.base_layer.bias', 'model.layers.12.self_attn.k_proj.base_layer.weight', 'model.layers.12.self_attn.k_proj.base_layer.weight.absmax', 'model.layers.12.self_attn.k_proj.base_layer.weight.quant_map', 'model.layers.12.self_attn.k_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.12.self_attn.k_proj.lora_A.default.weight', 'model.layers.12.self_attn.k_proj.lora_B.default.weight', 'model.layers.12.self_attn.o_proj.base_layer.bias', 'model.layers.12.self_attn.o_proj.base_layer.weight', 'model.layers.12.self_attn.o_proj.base_layer.weight.absmax', 'model.layers.12.self_attn.o_proj.base_layer.weight.quant_map', 'model.layers.12.self_attn.o_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.12.self_attn.o_proj.lora_A.default.weight', 'model.layers.12.self_attn.o_proj.lora_B.default.weight', 'model.layers.12.self_attn.q_proj.base_layer.bias', 'model.layers.12.self_attn.q_proj.base_layer.weight', 'model.layers.12.self_attn.q_proj.base_layer.weight.absmax', 'model.layers.12.self_attn.q_proj.base_layer.weight.quant_map', 'model.layers.12.self_attn.q_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.12.self_attn.q_proj.lora_A.default.weight', 'model.layers.12.self_attn.q_proj.lora_B.default.weight', 'model.layers.12.self_attn.v_proj.base_layer.bias', 'model.layers.12.self_attn.v_proj.base_layer.weight', 'model.layers.12.self_attn.v_proj.base_layer.weight.absmax', 'model.layers.12.self_attn.v_proj.base_layer.weight.quant_map', 'model.layers.12.self_attn.v_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.12.self_attn.v_proj.lora_A.default.weight', 'model.layers.12.self_attn.v_proj.lora_B.default.weight', 'model.layers.13.self_attn.k_proj.base_layer.bias', 'model.layers.13.self_attn.k_proj.base_layer.weight', 'model.layers.13.self_attn.k_proj.base_layer.weight.absmax', 'model.layers.13.self_attn.k_proj.base_layer.weight.quant_map', 'model.layers.13.self_attn.k_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.13.self_attn.k_proj.lora_A.default.weight', 'model.layers.13.self_attn.k_proj.lora_B.default.weight', 'model.layers.13.self_attn.o_proj.base_layer.bias', 'model.layers.13.self_attn.o_proj.base_layer.weight', 'model.layers.13.self_attn.o_proj.base_layer.weight.absmax', 'model.layers.13.self_attn.o_proj.base_layer.weight.quant_map', 'model.layers.13.self_attn.o_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.13.self_attn.o_proj.lora_A.default.weight', 'model.layers.13.self_attn.o_proj.lora_B.default.weight', 'model.layers.13.self_attn.q_proj.base_layer.bias', 'model.layers.13.self_attn.q_proj.base_layer.weight', 'model.layers.13.self_attn.q_proj.base_layer.weight.absmax', 'model.layers.13.self_attn.q_proj.base_layer.weight.quant_map', 'model.layers.13.self_attn.q_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.13.self_attn.q_proj.lora_A.default.weight', 'model.layers.13.self_attn.q_proj.lora_B.default.weight', 'model.layers.13.self_attn.v_proj.base_layer.bias', 'model.layers.13.self_attn.v_proj.base_layer.weight', 'model.layers.13.self_attn.v_proj.base_layer.weight.absmax', 'model.layers.13.self_attn.v_proj.base_layer.weight.quant_map', 'model.layers.13.self_attn.v_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.13.self_attn.v_proj.lora_A.default.weight', 'model.layers.13.self_attn.v_proj.lora_B.default.weight', 'model.layers.14.self_attn.k_proj.base_layer.bias', 'model.layers.14.self_attn.k_proj.base_layer.weight', 'model.layers.14.self_attn.k_proj.base_layer.weight.absmax', 'model.layers.14.self_attn.k_proj.base_layer.weight.quant_map', 'model.layers.14.self_attn.k_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.14.self_attn.k_proj.lora_A.default.weight', 'model.layers.14.self_attn.k_proj.lora_B.default.weight', 'model.layers.14.self_attn.o_proj.base_layer.bias', 'model.layers.14.self_attn.o_proj.base_layer.weight', 'model.layers.14.self_attn.o_proj.base_layer.weight.absmax', 'model.layers.14.self_attn.o_proj.base_layer.weight.quant_map', 'model.layers.14.self_attn.o_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.14.self_attn.o_proj.lora_A.default.weight', 'model.layers.14.self_attn.o_proj.lora_B.default.weight', 'model.layers.14.self_attn.q_proj.base_layer.bias', 'model.layers.14.self_attn.q_proj.base_layer.weight', 'model.layers.14.self_attn.q_proj.base_layer.weight.absmax', 'model.layers.14.self_attn.q_proj.base_layer.weight.quant_map', 'model.layers.14.self_attn.q_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.14.self_attn.q_proj.lora_A.default.weight', 'model.layers.14.self_attn.q_proj.lora_B.default.weight', 'model.layers.14.self_attn.v_proj.base_layer.bias', 'model.layers.14.self_attn.v_proj.base_layer.weight', 'model.layers.14.self_attn.v_proj.base_layer.weight.absmax', 'model.layers.14.self_attn.v_proj.base_layer.weight.quant_map', 'model.layers.14.self_attn.v_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.14.self_attn.v_proj.lora_A.default.weight', 'model.layers.14.self_attn.v_proj.lora_B.default.weight', 'model.layers.15.self_attn.k_proj.base_layer.bias', 'model.layers.15.self_attn.k_proj.base_layer.weight', 'model.layers.15.self_attn.k_proj.base_layer.weight.absmax', 'model.layers.15.self_attn.k_proj.base_layer.weight.quant_map', 'model.layers.15.self_attn.k_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.15.self_attn.k_proj.lora_A.default.weight', 'model.layers.15.self_attn.k_proj.lora_B.default.weight', 'model.layers.15.self_attn.o_proj.base_layer.bias', 'model.layers.15.self_attn.o_proj.base_layer.weight', 'model.layers.15.self_attn.o_proj.base_layer.weight.absmax', 'model.layers.15.self_attn.o_proj.base_layer.weight.quant_map', 'model.layers.15.self_attn.o_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.15.self_attn.o_proj.lora_A.default.weight', 'model.layers.15.self_attn.o_proj.lora_B.default.weight', 'model.layers.15.self_attn.q_proj.base_layer.bias', 'model.layers.15.self_attn.q_proj.base_layer.weight', 'model.layers.15.self_attn.q_proj.base_layer.weight.absmax', 'model.layers.15.self_attn.q_proj.base_layer.weight.quant_map', 'model.layers.15.self_attn.q_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.15.self_attn.q_proj.lora_A.default.weight', 'model.layers.15.self_attn.q_proj.lora_B.default.weight', 'model.layers.15.self_attn.v_proj.base_layer.bias', 'model.layers.15.self_attn.v_proj.base_layer.weight', 'model.layers.15.self_attn.v_proj.base_layer.weight.absmax', 'model.layers.15.self_attn.v_proj.base_layer.weight.quant_map', 'model.layers.15.self_attn.v_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.15.self_attn.v_proj.lora_A.default.weight', 'model.layers.15.self_attn.v_proj.lora_B.default.weight', 'model.layers.16.self_attn.k_proj.base_layer.bias', 'model.layers.16.self_attn.k_proj.base_layer.weight', 'model.layers.16.self_attn.k_proj.base_layer.weight.absmax', 'model.layers.16.self_attn.k_proj.base_layer.weight.quant_map', 'model.layers.16.self_attn.k_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.16.self_attn.k_proj.lora_A.default.weight', 'model.layers.16.self_attn.k_proj.lora_B.default.weight', 'model.layers.16.self_attn.o_proj.base_layer.bias', 'model.layers.16.self_attn.o_proj.base_layer.weight', 'model.layers.16.self_attn.o_proj.base_layer.weight.absmax', 'model.layers.16.self_attn.o_proj.base_layer.weight.quant_map', 'model.layers.16.self_attn.o_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.16.self_attn.o_proj.lora_A.default.weight', 'model.layers.16.self_attn.o_proj.lora_B.default.weight', 'model.layers.16.self_attn.q_proj.base_layer.bias', 'model.layers.16.self_attn.q_proj.base_layer.weight', 'model.layers.16.self_attn.q_proj.base_layer.weight.absmax', 'model.layers.16.self_attn.q_proj.base_layer.weight.quant_map', 'model.layers.16.self_attn.q_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.16.self_attn.q_proj.lora_A.default.weight', 'model.layers.16.self_attn.q_proj.lora_B.default.weight', 'model.layers.16.self_attn.v_proj.base_layer.bias', 'model.layers.16.self_attn.v_proj.base_layer.weight', 'model.layers.16.self_attn.v_proj.base_layer.weight.absmax', 'model.layers.16.self_attn.v_proj.base_layer.weight.quant_map', 'model.layers.16.self_attn.v_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.16.self_attn.v_proj.lora_A.default.weight', 'model.layers.16.self_attn.v_proj.lora_B.default.weight', 'model.layers.17.self_attn.k_proj.base_layer.bias', 'model.layers.17.self_attn.k_proj.base_layer.weight', 'model.layers.17.self_attn.k_proj.base_layer.weight.absmax', 'model.layers.17.self_attn.k_proj.base_layer.weight.quant_map', 'model.layers.17.self_attn.k_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.17.self_attn.k_proj.lora_A.default.weight', 'model.layers.17.self_attn.k_proj.lora_B.default.weight', 'model.layers.17.self_attn.o_proj.base_layer.bias', 'model.layers.17.self_attn.o_proj.base_layer.weight', 'model.layers.17.self_attn.o_proj.base_layer.weight.absmax', 'model.layers.17.self_attn.o_proj.base_layer.weight.quant_map', 'model.layers.17.self_attn.o_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.17.self_attn.o_proj.lora_A.default.weight', 'model.layers.17.self_attn.o_proj.lora_B.default.weight', 'model.layers.17.self_attn.q_proj.base_layer.bias', 'model.layers.17.self_attn.q_proj.base_layer.weight', 'model.layers.17.self_attn.q_proj.base_layer.weight.absmax', 'model.layers.17.self_attn.q_proj.base_layer.weight.quant_map', 'model.layers.17.self_attn.q_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.17.self_attn.q_proj.lora_A.default.weight', 'model.layers.17.self_attn.q_proj.lora_B.default.weight', 'model.layers.17.self_attn.v_proj.base_layer.bias', 'model.layers.17.self_attn.v_proj.base_layer.weight', 'model.layers.17.self_attn.v_proj.base_layer.weight.absmax', 'model.layers.17.self_attn.v_proj.base_layer.weight.quant_map', 'model.layers.17.self_attn.v_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.17.self_attn.v_proj.lora_A.default.weight', 'model.layers.17.self_attn.v_proj.lora_B.default.weight', 'model.layers.18.self_attn.k_proj.base_layer.bias', 'model.layers.18.self_attn.k_proj.base_layer.weight', 'model.layers.18.self_attn.k_proj.base_layer.weight.absmax', 'model.layers.18.self_attn.k_proj.base_layer.weight.quant_map', 'model.layers.18.self_attn.k_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.18.self_attn.k_proj.lora_A.default.weight', 'model.layers.18.self_attn.k_proj.lora_B.default.weight', 'model.layers.18.self_attn.o_proj.base_layer.bias', 'model.layers.18.self_attn.o_proj.base_layer.weight', 'model.layers.18.self_attn.o_proj.base_layer.weight.absmax', 'model.layers.18.self_attn.o_proj.base_layer.weight.quant_map', 'model.layers.18.self_attn.o_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.18.self_attn.o_proj.lora_A.default.weight', 'model.layers.18.self_attn.o_proj.lora_B.default.weight', 'model.layers.18.self_attn.q_proj.base_layer.bias', 'model.layers.18.self_attn.q_proj.base_layer.weight', 'model.layers.18.self_attn.q_proj.base_layer.weight.absmax', 'model.layers.18.self_attn.q_proj.base_layer.weight.quant_map', 'model.layers.18.self_attn.q_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.18.self_attn.q_proj.lora_A.default.weight', 'model.layers.18.self_attn.q_proj.lora_B.default.weight', 'model.layers.18.self_attn.v_proj.base_layer.bias', 'model.layers.18.self_attn.v_proj.base_layer.weight', 'model.layers.18.self_attn.v_proj.base_layer.weight.absmax', 'model.layers.18.self_attn.v_proj.base_layer.weight.quant_map', 'model.layers.18.self_attn.v_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.18.self_attn.v_proj.lora_A.default.weight', 'model.layers.18.self_attn.v_proj.lora_B.default.weight', 'model.layers.19.self_attn.k_proj.base_layer.bias', 'model.layers.19.self_attn.k_proj.base_layer.weight', 'model.layers.19.self_attn.k_proj.base_layer.weight.absmax', 'model.layers.19.self_attn.k_proj.base_layer.weight.quant_map', 'model.layers.19.self_attn.k_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.19.self_attn.k_proj.lora_A.default.weight', 'model.layers.19.self_attn.k_proj.lora_B.default.weight', 'model.layers.19.self_attn.o_proj.base_layer.bias', 'model.layers.19.self_attn.o_proj.base_layer.weight', 'model.layers.19.self_attn.o_proj.base_layer.weight.absmax', 'model.layers.19.self_attn.o_proj.base_layer.weight.quant_map', 'model.layers.19.self_attn.o_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.19.self_attn.o_proj.lora_A.default.weight', 'model.layers.19.self_attn.o_proj.lora_B.default.weight', 'model.layers.19.self_attn.q_proj.base_layer.bias', 'model.layers.19.self_attn.q_proj.base_layer.weight', 'model.layers.19.self_attn.q_proj.base_layer.weight.absmax', 'model.layers.19.self_attn.q_proj.base_layer.weight.quant_map', 'model.layers.19.self_attn.q_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.19.self_attn.q_proj.lora_A.default.weight', 'model.layers.19.self_attn.q_proj.lora_B.default.weight', 'model.layers.19.self_attn.v_proj.base_layer.bias', 'model.layers.19.self_attn.v_proj.base_layer.weight', 'model.layers.19.self_attn.v_proj.base_layer.weight.absmax', 'model.layers.19.self_attn.v_proj.base_layer.weight.quant_map', 'model.layers.19.self_attn.v_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.19.self_attn.v_proj.lora_A.default.weight', 'model.layers.19.self_attn.v_proj.lora_B.default.weight', 'model.layers.2.self_attn.k_proj.base_layer.bias', 'model.layers.2.self_attn.k_proj.base_layer.weight', 'model.layers.2.self_attn.k_proj.base_layer.weight.absmax', 'model.layers.2.self_attn.k_proj.base_layer.weight.quant_map', 'model.layers.2.self_attn.k_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.2.self_attn.k_proj.lora_A.default.weight', 'model.layers.2.self_attn.k_proj.lora_B.default.weight', 'model.layers.2.self_attn.o_proj.base_layer.bias', 'model.layers.2.self_attn.o_proj.base_layer.weight', 'model.layers.2.self_attn.o_proj.base_layer.weight.absmax', 'model.layers.2.self_attn.o_proj.base_layer.weight.quant_map', 'model.layers.2.self_attn.o_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.2.self_attn.o_proj.lora_A.default.weight', 'model.layers.2.self_attn.o_proj.lora_B.default.weight', 'model.layers.2.self_attn.q_proj.base_layer.bias', 'model.layers.2.self_attn.q_proj.base_layer.weight', 'model.layers.2.self_attn.q_proj.base_layer.weight.absmax', 'model.layers.2.self_attn.q_proj.base_layer.weight.quant_map', 'model.layers.2.self_attn.q_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.2.self_attn.q_proj.lora_A.default.weight', 'model.layers.2.self_attn.q_proj.lora_B.default.weight', 'model.layers.2.self_attn.v_proj.base_layer.bias', 'model.layers.2.self_attn.v_proj.base_layer.weight', 'model.layers.2.self_attn.v_proj.base_layer.weight.absmax', 'model.layers.2.self_attn.v_proj.base_layer.weight.quant_map', 'model.layers.2.self_attn.v_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.2.self_attn.v_proj.lora_A.default.weight', 'model.layers.2.self_attn.v_proj.lora_B.default.weight', 'model.layers.20.self_attn.k_proj.base_layer.bias', 'model.layers.20.self_attn.k_proj.base_layer.weight', 'model.layers.20.self_attn.k_proj.base_layer.weight.absmax', 'model.layers.20.self_attn.k_proj.base_layer.weight.quant_map', 'model.layers.20.self_attn.k_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.20.self_attn.k_proj.lora_A.default.weight', 'model.layers.20.self_attn.k_proj.lora_B.default.weight', 'model.layers.20.self_attn.o_proj.base_layer.bias', 'model.layers.20.self_attn.o_proj.base_layer.weight', 'model.layers.20.self_attn.o_proj.base_layer.weight.absmax', 'model.layers.20.self_attn.o_proj.base_layer.weight.quant_map', 'model.layers.20.self_attn.o_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.20.self_attn.o_proj.lora_A.default.weight', 'model.layers.20.self_attn.o_proj.lora_B.default.weight', 'model.layers.20.self_attn.q_proj.base_layer.bias', 'model.layers.20.self_attn.q_proj.base_layer.weight', 'model.layers.20.self_attn.q_proj.base_layer.weight.absmax', 'model.layers.20.self_attn.q_proj.base_layer.weight.quant_map', 'model.layers.20.self_attn.q_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.20.self_attn.q_proj.lora_A.default.weight', 'model.layers.20.self_attn.q_proj.lora_B.default.weight', 'model.layers.20.self_attn.v_proj.base_layer.bias', 'model.layers.20.self_attn.v_proj.base_layer.weight', 'model.layers.20.self_attn.v_proj.base_layer.weight.absmax', 'model.layers.20.self_attn.v_proj.base_layer.weight.quant_map', 'model.layers.20.self_attn.v_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.20.self_attn.v_proj.lora_A.default.weight', 'model.layers.20.self_attn.v_proj.lora_B.default.weight', 'model.layers.21.self_attn.k_proj.base_layer.bias', 'model.layers.21.self_attn.k_proj.base_layer.weight', 'model.layers.21.self_attn.k_proj.base_layer.weight.absmax', 'model.layers.21.self_attn.k_proj.base_layer.weight.quant_map', 'model.layers.21.self_attn.k_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.21.self_attn.k_proj.lora_A.default.weight', 'model.layers.21.self_attn.k_proj.lora_B.default.weight', 'model.layers.21.self_attn.o_proj.base_layer.bias', 'model.layers.21.self_attn.o_proj.base_layer.weight', 'model.layers.21.self_attn.o_proj.base_layer.weight.absmax', 'model.layers.21.self_attn.o_proj.base_layer.weight.quant_map', 'model.layers.21.self_attn.o_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.21.self_attn.o_proj.lora_A.default.weight', 'model.layers.21.self_attn.o_proj.lora_B.default.weight', 'model.layers.21.self_attn.q_proj.base_layer.bias', 'model.layers.21.self_attn.q_proj.base_layer.weight', 'model.layers.21.self_attn.q_proj.base_layer.weight.absmax', 'model.layers.21.self_attn.q_proj.base_layer.weight.quant_map', 'model.layers.21.self_attn.q_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.21.self_attn.q_proj.lora_A.default.weight', 'model.layers.21.self_attn.q_proj.lora_B.default.weight', 'model.layers.21.self_attn.v_proj.base_layer.bias', 'model.layers.21.self_attn.v_proj.base_layer.weight', 'model.layers.21.self_attn.v_proj.base_layer.weight.absmax', 'model.layers.21.self_attn.v_proj.base_layer.weight.quant_map', 'model.layers.21.self_attn.v_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.21.self_attn.v_proj.lora_A.default.weight', 'model.layers.21.self_attn.v_proj.lora_B.default.weight', 'model.layers.22.self_attn.k_proj.base_layer.bias', 'model.layers.22.self_attn.k_proj.base_layer.weight', 'model.layers.22.self_attn.k_proj.base_layer.weight.absmax', 'model.layers.22.self_attn.k_proj.base_layer.weight.quant_map', 'model.layers.22.self_attn.k_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.22.self_attn.k_proj.lora_A.default.weight', 'model.layers.22.self_attn.k_proj.lora_B.default.weight', 'model.layers.22.self_attn.o_proj.base_layer.bias', 'model.layers.22.self_attn.o_proj.base_layer.weight', 'model.layers.22.self_attn.o_proj.base_layer.weight.absmax', 'model.layers.22.self_attn.o_proj.base_layer.weight.quant_map', 'model.layers.22.self_attn.o_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.22.self_attn.o_proj.lora_A.default.weight', 'model.layers.22.self_attn.o_proj.lora_B.default.weight', 'model.layers.22.self_attn.q_proj.base_layer.bias', 'model.layers.22.self_attn.q_proj.base_layer.weight', 'model.layers.22.self_attn.q_proj.base_layer.weight.absmax', 'model.layers.22.self_attn.q_proj.base_layer.weight.quant_map', 'model.layers.22.self_attn.q_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.22.self_attn.q_proj.lora_A.default.weight', 'model.layers.22.self_attn.q_proj.lora_B.default.weight', 'model.layers.22.self_attn.v_proj.base_layer.bias', 'model.layers.22.self_attn.v_proj.base_layer.weight', 'model.layers.22.self_attn.v_proj.base_layer.weight.absmax', 'model.layers.22.self_attn.v_proj.base_layer.weight.quant_map', 'model.layers.22.self_attn.v_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.22.self_attn.v_proj.lora_A.default.weight', 'model.layers.22.self_attn.v_proj.lora_B.default.weight', 'model.layers.23.self_attn.k_proj.base_layer.bias', 'model.layers.23.self_attn.k_proj.base_layer.weight', 'model.layers.23.self_attn.k_proj.base_layer.weight.absmax', 'model.layers.23.self_attn.k_proj.base_layer.weight.quant_map', 'model.layers.23.self_attn.k_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.23.self_attn.k_proj.lora_A.default.weight', 'model.layers.23.self_attn.k_proj.lora_B.default.weight', 'model.layers.23.self_attn.o_proj.base_layer.bias', 'model.layers.23.self_attn.o_proj.base_layer.weight', 'model.layers.23.self_attn.o_proj.base_layer.weight.absmax', 'model.layers.23.self_attn.o_proj.base_layer.weight.quant_map', 'model.layers.23.self_attn.o_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.23.self_attn.o_proj.lora_A.default.weight', 'model.layers.23.self_attn.o_proj.lora_B.default.weight', 'model.layers.23.self_attn.q_proj.base_layer.bias', 'model.layers.23.self_attn.q_proj.base_layer.weight', 'model.layers.23.self_attn.q_proj.base_layer.weight.absmax', 'model.layers.23.self_attn.q_proj.base_layer.weight.quant_map', 'model.layers.23.self_attn.q_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.23.self_attn.q_proj.lora_A.default.weight', 'model.layers.23.self_attn.q_proj.lora_B.default.weight', 'model.layers.23.self_attn.v_proj.base_layer.bias', 'model.layers.23.self_attn.v_proj.base_layer.weight', 'model.layers.23.self_attn.v_proj.base_layer.weight.absmax', 'model.layers.23.self_attn.v_proj.base_layer.weight.quant_map', 'model.layers.23.self_attn.v_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.23.self_attn.v_proj.lora_A.default.weight', 'model.layers.23.self_attn.v_proj.lora_B.default.weight', 'model.layers.24.self_attn.k_proj.base_layer.bias', 'model.layers.24.self_attn.k_proj.base_layer.weight', 'model.layers.24.self_attn.k_proj.base_layer.weight.absmax', 'model.layers.24.self_attn.k_proj.base_layer.weight.quant_map', 'model.layers.24.self_attn.k_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.24.self_attn.k_proj.lora_A.default.weight', 'model.layers.24.self_attn.k_proj.lora_B.default.weight', 'model.layers.24.self_attn.o_proj.base_layer.bias', 'model.layers.24.self_attn.o_proj.base_layer.weight', 'model.layers.24.self_attn.o_proj.base_layer.weight.absmax', 'model.layers.24.self_attn.o_proj.base_layer.weight.quant_map', 'model.layers.24.self_attn.o_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.24.self_attn.o_proj.lora_A.default.weight', 'model.layers.24.self_attn.o_proj.lora_B.default.weight', 'model.layers.24.self_attn.q_proj.base_layer.bias', 'model.layers.24.self_attn.q_proj.base_layer.weight', 'model.layers.24.self_attn.q_proj.base_layer.weight.absmax', 'model.layers.24.self_attn.q_proj.base_layer.weight.quant_map', 'model.layers.24.self_attn.q_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.24.self_attn.q_proj.lora_A.default.weight', 'model.layers.24.self_attn.q_proj.lora_B.default.weight', 'model.layers.24.self_attn.v_proj.base_layer.bias', 'model.layers.24.self_attn.v_proj.base_layer.weight', 'model.layers.24.self_attn.v_proj.base_layer.weight.absmax', 'model.layers.24.self_attn.v_proj.base_layer.weight.quant_map', 'model.layers.24.self_attn.v_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.24.self_attn.v_proj.lora_A.default.weight', 'model.layers.24.self_attn.v_proj.lora_B.default.weight', 'model.layers.25.self_attn.k_proj.base_layer.bias', 'model.layers.25.self_attn.k_proj.base_layer.weight', 'model.layers.25.self_attn.k_proj.base_layer.weight.absmax', 'model.layers.25.self_attn.k_proj.base_layer.weight.quant_map', 'model.layers.25.self_attn.k_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.25.self_attn.k_proj.lora_A.default.weight', 'model.layers.25.self_attn.k_proj.lora_B.default.weight', 'model.layers.25.self_attn.o_proj.base_layer.bias', 'model.layers.25.self_attn.o_proj.base_layer.weight', 'model.layers.25.self_attn.o_proj.base_layer.weight.absmax', 'model.layers.25.self_attn.o_proj.base_layer.weight.quant_map', 'model.layers.25.self_attn.o_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.25.self_attn.o_proj.lora_A.default.weight', 'model.layers.25.self_attn.o_proj.lora_B.default.weight', 'model.layers.25.self_attn.q_proj.base_layer.bias', 'model.layers.25.self_attn.q_proj.base_layer.weight', 'model.layers.25.self_attn.q_proj.base_layer.weight.absmax', 'model.layers.25.self_attn.q_proj.base_layer.weight.quant_map', 'model.layers.25.self_attn.q_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.25.self_attn.q_proj.lora_A.default.weight', 'model.layers.25.self_attn.q_proj.lora_B.default.weight', 'model.layers.25.self_attn.v_proj.base_layer.bias', 'model.layers.25.self_attn.v_proj.base_layer.weight', 'model.layers.25.self_attn.v_proj.base_layer.weight.absmax', 'model.layers.25.self_attn.v_proj.base_layer.weight.quant_map', 'model.layers.25.self_attn.v_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.25.self_attn.v_proj.lora_A.default.weight', 'model.layers.25.self_attn.v_proj.lora_B.default.weight', 'model.layers.26.self_attn.k_proj.base_layer.bias', 'model.layers.26.self_attn.k_proj.base_layer.weight', 'model.layers.26.self_attn.k_proj.base_layer.weight.absmax', 'model.layers.26.self_attn.k_proj.base_layer.weight.quant_map', 'model.layers.26.self_attn.k_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.26.self_attn.k_proj.lora_A.default.weight', 'model.layers.26.self_attn.k_proj.lora_B.default.weight', 'model.layers.26.self_attn.o_proj.base_layer.bias', 'model.layers.26.self_attn.o_proj.base_layer.weight', 'model.layers.26.self_attn.o_proj.base_layer.weight.absmax', 'model.layers.26.self_attn.o_proj.base_layer.weight.quant_map', 'model.layers.26.self_attn.o_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.26.self_attn.o_proj.lora_A.default.weight', 'model.layers.26.self_attn.o_proj.lora_B.default.weight', 'model.layers.26.self_attn.q_proj.base_layer.bias', 'model.layers.26.self_attn.q_proj.base_layer.weight', 'model.layers.26.self_attn.q_proj.base_layer.weight.absmax', 'model.layers.26.self_attn.q_proj.base_layer.weight.quant_map', 'model.layers.26.self_attn.q_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.26.self_attn.q_proj.lora_A.default.weight', 'model.layers.26.self_attn.q_proj.lora_B.default.weight', 'model.layers.26.self_attn.v_proj.base_layer.bias', 'model.layers.26.self_attn.v_proj.base_layer.weight', 'model.layers.26.self_attn.v_proj.base_layer.weight.absmax', 'model.layers.26.self_attn.v_proj.base_layer.weight.quant_map', 'model.layers.26.self_attn.v_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.26.self_attn.v_proj.lora_A.default.weight', 'model.layers.26.self_attn.v_proj.lora_B.default.weight', 'model.layers.27.self_attn.k_proj.base_layer.bias', 'model.layers.27.self_attn.k_proj.base_layer.weight', 'model.layers.27.self_attn.k_proj.base_layer.weight.absmax', 'model.layers.27.self_attn.k_proj.base_layer.weight.quant_map', 'model.layers.27.self_attn.k_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.27.self_attn.k_proj.lora_A.default.weight', 'model.layers.27.self_attn.k_proj.lora_B.default.weight', 'model.layers.27.self_attn.o_proj.base_layer.bias', 'model.layers.27.self_attn.o_proj.base_layer.weight', 'model.layers.27.self_attn.o_proj.base_layer.weight.absmax', 'model.layers.27.self_attn.o_proj.base_layer.weight.quant_map', 'model.layers.27.self_attn.o_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.27.self_attn.o_proj.lora_A.default.weight', 'model.layers.27.self_attn.o_proj.lora_B.default.weight', 'model.layers.27.self_attn.q_proj.base_layer.bias', 'model.layers.27.self_attn.q_proj.base_layer.weight', 'model.layers.27.self_attn.q_proj.base_layer.weight.absmax', 'model.layers.27.self_attn.q_proj.base_layer.weight.quant_map', 'model.layers.27.self_attn.q_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.27.self_attn.q_proj.lora_A.default.weight', 'model.layers.27.self_attn.q_proj.lora_B.default.weight', 'model.layers.27.self_attn.v_proj.base_layer.bias', 'model.layers.27.self_attn.v_proj.base_layer.weight', 'model.layers.27.self_attn.v_proj.base_layer.weight.absmax', 'model.layers.27.self_attn.v_proj.base_layer.weight.quant_map', 'model.layers.27.self_attn.v_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.27.self_attn.v_proj.lora_A.default.weight', 'model.layers.27.self_attn.v_proj.lora_B.default.weight', 'model.layers.28.self_attn.k_proj.base_layer.bias', 'model.layers.28.self_attn.k_proj.base_layer.weight', 'model.layers.28.self_attn.k_proj.base_layer.weight.absmax', 'model.layers.28.self_attn.k_proj.base_layer.weight.quant_map', 'model.layers.28.self_attn.k_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.28.self_attn.k_proj.lora_A.default.weight', 'model.layers.28.self_attn.k_proj.lora_B.default.weight', 'model.layers.28.self_attn.o_proj.base_layer.bias', 'model.layers.28.self_attn.o_proj.base_layer.weight', 'model.layers.28.self_attn.o_proj.base_layer.weight.absmax', 'model.layers.28.self_attn.o_proj.base_layer.weight.quant_map', 'model.layers.28.self_attn.o_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.28.self_attn.o_proj.lora_A.default.weight', 'model.layers.28.self_attn.o_proj.lora_B.default.weight', 'model.layers.28.self_attn.q_proj.base_layer.bias', 'model.layers.28.self_attn.q_proj.base_layer.weight', 'model.layers.28.self_attn.q_proj.base_layer.weight.absmax', 'model.layers.28.self_attn.q_proj.base_layer.weight.quant_map', 'model.layers.28.self_attn.q_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.28.self_attn.q_proj.lora_A.default.weight', 'model.layers.28.self_attn.q_proj.lora_B.default.weight', 'model.layers.28.self_attn.v_proj.base_layer.bias', 'model.layers.28.self_attn.v_proj.base_layer.weight', 'model.layers.28.self_attn.v_proj.base_layer.weight.absmax', 'model.layers.28.self_attn.v_proj.base_layer.weight.quant_map', 'model.layers.28.self_attn.v_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.28.self_attn.v_proj.lora_A.default.weight', 'model.layers.28.self_attn.v_proj.lora_B.default.weight', 'model.layers.29.self_attn.k_proj.base_layer.bias', 'model.layers.29.self_attn.k_proj.base_layer.weight', 'model.layers.29.self_attn.k_proj.base_layer.weight.absmax', 'model.layers.29.self_attn.k_proj.base_layer.weight.quant_map', 'model.layers.29.self_attn.k_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.29.self_attn.k_proj.lora_A.default.weight', 'model.layers.29.self_attn.k_proj.lora_B.default.weight', 'model.layers.29.self_attn.o_proj.base_layer.bias', 'model.layers.29.self_attn.o_proj.base_layer.weight', 'model.layers.29.self_attn.o_proj.base_layer.weight.absmax', 'model.layers.29.self_attn.o_proj.base_layer.weight.quant_map', 'model.layers.29.self_attn.o_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.29.self_attn.o_proj.lora_A.default.weight', 'model.layers.29.self_attn.o_proj.lora_B.default.weight', 'model.layers.29.self_attn.q_proj.base_layer.bias', 'model.layers.29.self_attn.q_proj.base_layer.weight', 'model.layers.29.self_attn.q_proj.base_layer.weight.absmax', 'model.layers.29.self_attn.q_proj.base_layer.weight.quant_map', 'model.layers.29.self_attn.q_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.29.self_attn.q_proj.lora_A.default.weight', 'model.layers.29.self_attn.q_proj.lora_B.default.weight', 'model.layers.29.self_attn.v_proj.base_layer.bias', 'model.layers.29.self_attn.v_proj.base_layer.weight', 'model.layers.29.self_attn.v_proj.base_layer.weight.absmax', 'model.layers.29.self_attn.v_proj.base_layer.weight.quant_map', 'model.layers.29.self_attn.v_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.29.self_attn.v_proj.lora_A.default.weight', 'model.layers.29.self_attn.v_proj.lora_B.default.weight', 'model.layers.3.self_attn.k_proj.base_layer.bias', 'model.layers.3.self_attn.k_proj.base_layer.weight', 'model.layers.3.self_attn.k_proj.base_layer.weight.absmax', 'model.layers.3.self_attn.k_proj.base_layer.weight.quant_map', 'model.layers.3.self_attn.k_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.3.self_attn.k_proj.lora_A.default.weight', 'model.layers.3.self_attn.k_proj.lora_B.default.weight', 'model.layers.3.self_attn.o_proj.base_layer.bias', 'model.layers.3.self_attn.o_proj.base_layer.weight', 'model.layers.3.self_attn.o_proj.base_layer.weight.absmax', 'model.layers.3.self_attn.o_proj.base_layer.weight.quant_map', 'model.layers.3.self_attn.o_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.3.self_attn.o_proj.lora_A.default.weight', 'model.layers.3.self_attn.o_proj.lora_B.default.weight', 'model.layers.3.self_attn.q_proj.base_layer.bias', 'model.layers.3.self_attn.q_proj.base_layer.weight', 'model.layers.3.self_attn.q_proj.base_layer.weight.absmax', 'model.layers.3.self_attn.q_proj.base_layer.weight.quant_map', 'model.layers.3.self_attn.q_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.3.self_attn.q_proj.lora_A.default.weight', 'model.layers.3.self_attn.q_proj.lora_B.default.weight', 'model.layers.3.self_attn.v_proj.base_layer.bias', 'model.layers.3.self_attn.v_proj.base_layer.weight', 'model.layers.3.self_attn.v_proj.base_layer.weight.absmax', 'model.layers.3.self_attn.v_proj.base_layer.weight.quant_map', 'model.layers.3.self_attn.v_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.3.self_attn.v_proj.lora_A.default.weight', 'model.layers.3.self_attn.v_proj.lora_B.default.weight', 'model.layers.4.self_attn.k_proj.base_layer.bias', 'model.layers.4.self_attn.k_proj.base_layer.weight', 'model.layers.4.self_attn.k_proj.base_layer.weight.absmax', 'model.layers.4.self_attn.k_proj.base_layer.weight.quant_map', 'model.layers.4.self_attn.k_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.4.self_attn.k_proj.lora_A.default.weight', 'model.layers.4.self_attn.k_proj.lora_B.default.weight', 'model.layers.4.self_attn.o_proj.base_layer.bias', 'model.layers.4.self_attn.o_proj.base_layer.weight', 'model.layers.4.self_attn.o_proj.base_layer.weight.absmax', 'model.layers.4.self_attn.o_proj.base_layer.weight.quant_map', 'model.layers.4.self_attn.o_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.4.self_attn.o_proj.lora_A.default.weight', 'model.layers.4.self_attn.o_proj.lora_B.default.weight', 'model.layers.4.self_attn.q_proj.base_layer.bias', 'model.layers.4.self_attn.q_proj.base_layer.weight', 'model.layers.4.self_attn.q_proj.base_layer.weight.absmax', 'model.layers.4.self_attn.q_proj.base_layer.weight.quant_map', 'model.layers.4.self_attn.q_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.4.self_attn.q_proj.lora_A.default.weight', 'model.layers.4.self_attn.q_proj.lora_B.default.weight', 'model.layers.4.self_attn.v_proj.base_layer.bias', 'model.layers.4.self_attn.v_proj.base_layer.weight', 'model.layers.4.self_attn.v_proj.base_layer.weight.absmax', 'model.layers.4.self_attn.v_proj.base_layer.weight.quant_map', 'model.layers.4.self_attn.v_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.4.self_attn.v_proj.lora_A.default.weight', 'model.layers.4.self_attn.v_proj.lora_B.default.weight', 'model.layers.5.self_attn.k_proj.base_layer.bias', 'model.layers.5.self_attn.k_proj.base_layer.weight', 'model.layers.5.self_attn.k_proj.base_layer.weight.absmax', 'model.layers.5.self_attn.k_proj.base_layer.weight.quant_map', 'model.layers.5.self_attn.k_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.5.self_attn.k_proj.lora_A.default.weight', 'model.layers.5.self_attn.k_proj.lora_B.default.weight', 'model.layers.5.self_attn.o_proj.base_layer.bias', 'model.layers.5.self_attn.o_proj.base_layer.weight', 'model.layers.5.self_attn.o_proj.base_layer.weight.absmax', 'model.layers.5.self_attn.o_proj.base_layer.weight.quant_map', 'model.layers.5.self_attn.o_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.5.self_attn.o_proj.lora_A.default.weight', 'model.layers.5.self_attn.o_proj.lora_B.default.weight', 'model.layers.5.self_attn.q_proj.base_layer.bias', 'model.layers.5.self_attn.q_proj.base_layer.weight', 'model.layers.5.self_attn.q_proj.base_layer.weight.absmax', 'model.layers.5.self_attn.q_proj.base_layer.weight.quant_map', 'model.layers.5.self_attn.q_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.5.self_attn.q_proj.lora_A.default.weight', 'model.layers.5.self_attn.q_proj.lora_B.default.weight', 'model.layers.5.self_attn.v_proj.base_layer.bias', 'model.layers.5.self_attn.v_proj.base_layer.weight', 'model.layers.5.self_attn.v_proj.base_layer.weight.absmax', 'model.layers.5.self_attn.v_proj.base_layer.weight.quant_map', 'model.layers.5.self_attn.v_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.5.self_attn.v_proj.lora_A.default.weight', 'model.layers.5.self_attn.v_proj.lora_B.default.weight', 'model.layers.6.self_attn.k_proj.base_layer.bias', 'model.layers.6.self_attn.k_proj.base_layer.weight', 'model.layers.6.self_attn.k_proj.base_layer.weight.absmax', 'model.layers.6.self_attn.k_proj.base_layer.weight.quant_map', 'model.layers.6.self_attn.k_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.6.self_attn.k_proj.lora_A.default.weight', 'model.layers.6.self_attn.k_proj.lora_B.default.weight', 'model.layers.6.self_attn.o_proj.base_layer.bias', 'model.layers.6.self_attn.o_proj.base_layer.weight', 'model.layers.6.self_attn.o_proj.base_layer.weight.absmax', 'model.layers.6.self_attn.o_proj.base_layer.weight.quant_map', 'model.layers.6.self_attn.o_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.6.self_attn.o_proj.lora_A.default.weight', 'model.layers.6.self_attn.o_proj.lora_B.default.weight', 'model.layers.6.self_attn.q_proj.base_layer.bias', 'model.layers.6.self_attn.q_proj.base_layer.weight', 'model.layers.6.self_attn.q_proj.base_layer.weight.absmax', 'model.layers.6.self_attn.q_proj.base_layer.weight.quant_map', 'model.layers.6.self_attn.q_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.6.self_attn.q_proj.lora_A.default.weight', 'model.layers.6.self_attn.q_proj.lora_B.default.weight', 'model.layers.6.self_attn.v_proj.base_layer.bias', 'model.layers.6.self_attn.v_proj.base_layer.weight', 'model.layers.6.self_attn.v_proj.base_layer.weight.absmax', 'model.layers.6.self_attn.v_proj.base_layer.weight.quant_map', 'model.layers.6.self_attn.v_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.6.self_attn.v_proj.lora_A.default.weight', 'model.layers.6.self_attn.v_proj.lora_B.default.weight', 'model.layers.7.self_attn.k_proj.base_layer.bias', 'model.layers.7.self_attn.k_proj.base_layer.weight', 'model.layers.7.self_attn.k_proj.base_layer.weight.absmax', 'model.layers.7.self_attn.k_proj.base_layer.weight.quant_map', 'model.layers.7.self_attn.k_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.7.self_attn.k_proj.lora_A.default.weight', 'model.layers.7.self_attn.k_proj.lora_B.default.weight', 'model.layers.7.self_attn.o_proj.base_layer.bias', 'model.layers.7.self_attn.o_proj.base_layer.weight', 'model.layers.7.self_attn.o_proj.base_layer.weight.absmax', 'model.layers.7.self_attn.o_proj.base_layer.weight.quant_map', 'model.layers.7.self_attn.o_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.7.self_attn.o_proj.lora_A.default.weight', 'model.layers.7.self_attn.o_proj.lora_B.default.weight', 'model.layers.7.self_attn.q_proj.base_layer.bias', 'model.layers.7.self_attn.q_proj.base_layer.weight', 'model.layers.7.self_attn.q_proj.base_layer.weight.absmax', 'model.layers.7.self_attn.q_proj.base_layer.weight.quant_map', 'model.layers.7.self_attn.q_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.7.self_attn.q_proj.lora_A.default.weight', 'model.layers.7.self_attn.q_proj.lora_B.default.weight', 'model.layers.7.self_attn.v_proj.base_layer.bias', 'model.layers.7.self_attn.v_proj.base_layer.weight', 'model.layers.7.self_attn.v_proj.base_layer.weight.absmax', 'model.layers.7.self_attn.v_proj.base_layer.weight.quant_map', 'model.layers.7.self_attn.v_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.7.self_attn.v_proj.lora_A.default.weight', 'model.layers.7.self_attn.v_proj.lora_B.default.weight', 'model.layers.8.self_attn.k_proj.base_layer.bias', 'model.layers.8.self_attn.k_proj.base_layer.weight', 'model.layers.8.self_attn.k_proj.base_layer.weight.absmax', 'model.layers.8.self_attn.k_proj.base_layer.weight.quant_map', 'model.layers.8.self_attn.k_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.8.self_attn.k_proj.lora_A.default.weight', 'model.layers.8.self_attn.k_proj.lora_B.default.weight', 'model.layers.8.self_attn.o_proj.base_layer.bias', 'model.layers.8.self_attn.o_proj.base_layer.weight', 'model.layers.8.self_attn.o_proj.base_layer.weight.absmax', 'model.layers.8.self_attn.o_proj.base_layer.weight.quant_map', 'model.layers.8.self_attn.o_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.8.self_attn.o_proj.lora_A.default.weight', 'model.layers.8.self_attn.o_proj.lora_B.default.weight', 'model.layers.8.self_attn.q_proj.base_layer.bias', 'model.layers.8.self_attn.q_proj.base_layer.weight', 'model.layers.8.self_attn.q_proj.base_layer.weight.absmax', 'model.layers.8.self_attn.q_proj.base_layer.weight.quant_map', 'model.layers.8.self_attn.q_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.8.self_attn.q_proj.lora_A.default.weight', 'model.layers.8.self_attn.q_proj.lora_B.default.weight', 'model.layers.8.self_attn.v_proj.base_layer.bias', 'model.layers.8.self_attn.v_proj.base_layer.weight', 'model.layers.8.self_attn.v_proj.base_layer.weight.absmax', 'model.layers.8.self_attn.v_proj.base_layer.weight.quant_map', 'model.layers.8.self_attn.v_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.8.self_attn.v_proj.lora_A.default.weight', 'model.layers.8.self_attn.v_proj.lora_B.default.weight', 'model.layers.9.self_attn.k_proj.base_layer.bias', 'model.layers.9.self_attn.k_proj.base_layer.weight', 'model.layers.9.self_attn.k_proj.base_layer.weight.absmax', 'model.layers.9.self_attn.k_proj.base_layer.weight.quant_map', 'model.layers.9.self_attn.k_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.9.self_attn.k_proj.lora_A.default.weight', 'model.layers.9.self_attn.k_proj.lora_B.default.weight', 'model.layers.9.self_attn.o_proj.base_layer.bias', 'model.layers.9.self_attn.o_proj.base_layer.weight', 'model.layers.9.self_attn.o_proj.base_layer.weight.absmax', 'model.layers.9.self_attn.o_proj.base_layer.weight.quant_map', 'model.layers.9.self_attn.o_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.9.self_attn.o_proj.lora_A.default.weight', 'model.layers.9.self_attn.o_proj.lora_B.default.weight', 'model.layers.9.self_attn.q_proj.base_layer.bias', 'model.layers.9.self_attn.q_proj.base_layer.weight', 'model.layers.9.self_attn.q_proj.base_layer.weight.absmax', 'model.layers.9.self_attn.q_proj.base_layer.weight.quant_map', 'model.layers.9.self_attn.q_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.9.self_attn.q_proj.lora_A.default.weight', 'model.layers.9.self_attn.q_proj.lora_B.default.weight', 'model.layers.9.self_attn.v_proj.base_layer.bias', 'model.layers.9.self_attn.v_proj.base_layer.weight', 'model.layers.9.self_attn.v_proj.base_layer.weight.absmax', 'model.layers.9.self_attn.v_proj.base_layer.weight.quant_map', 'model.layers.9.self_attn.v_proj.base_layer.weight.quant_state.bitsandbytes__nf4', 'model.layers.9.self_attn.v_proj.lora_A.default.weight', 'model.layers.9.self_attn.v_proj.lora_B.default.weight']
- This IS expected if you are initializing Starcoder2ForCausalLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing Starcoder2ForCausalLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of Starcoder2ForCausalLM were not initialized from the model checkpoint at finetune_starcoder2/final_checkpoint and are newly initialized: ['model.layers.0.self_attn.k_proj.bias', 'model.layers.0.self_attn.k_proj.weight', 'model.layers.0.self_attn.o_proj.bias', 'model.layers.0.self_attn.o_proj.weight', 'model.layers.0.self_attn.q_proj.bias', 'model.layers.0.self_attn.q_proj.weight', 'model.layers.0.self_attn.v_proj.bias', 'model.layers.0.self_attn.v_proj.weight', 'model.layers.1.self_attn.k_proj.bias', 'model.layers.1.self_attn.k_proj.weight', 'model.layers.1.self_attn.o_proj.bias', 'model.layers.1.self_attn.o_proj.weight', 'model.layers.1.self_attn.q_proj.bias', 'model.layers.1.self_attn.q_proj.weight', 'model.layers.1.self_attn.v_proj.bias', 'model.layers.1.self_attn.v_proj.weight', 'model.layers.10.self_attn.k_proj.bias', 'model.layers.10.self_attn.k_proj.weight', 'model.layers.10.self_attn.o_proj.bias', 'model.layers.10.self_attn.o_proj.weight', 'model.layers.10.self_attn.q_proj.bias', 'model.layers.10.self_attn.q_proj.weight', 'model.layers.10.self_attn.v_proj.bias', 'model.layers.10.self_attn.v_proj.weight', 'model.layers.11.self_attn.k_proj.bias', 'model.layers.11.self_attn.k_proj.weight', 'model.layers.11.self_attn.o_proj.bias', 'model.layers.11.self_attn.o_proj.weight', 'model.layers.11.self_attn.q_proj.bias', 'model.layers.11.self_attn.q_proj.weight', 'model.layers.11.self_attn.v_proj.bias', 'model.layers.11.self_attn.v_proj.weight', 'model.layers.12.self_attn.k_proj.bias', 'model.layers.12.self_attn.k_proj.weight', 'model.layers.12.self_attn.o_proj.bias', 'model.layers.12.self_attn.o_proj.weight', 'model.layers.12.self_attn.q_proj.bias', 'model.layers.12.self_attn.q_proj.weight', 'model.layers.12.self_attn.v_proj.bias', 'model.layers.12.self_attn.v_proj.weight', 'model.layers.13.self_attn.k_proj.bias', 'model.layers.13.self_attn.k_proj.weight', 'model.layers.13.self_attn.o_proj.bias', 'model.layers.13.self_attn.o_proj.weight', 'model.layers.13.self_attn.q_proj.bias', 'model.layers.13.self_attn.q_proj.weight', 'model.layers.13.self_attn.v_proj.bias', 'model.layers.13.self_attn.v_proj.weight', 'model.layers.14.self_attn.k_proj.bias', 'model.layers.14.self_attn.k_proj.weight', 'model.layers.14.self_attn.o_proj.bias', 'model.layers.14.self_attn.o_proj.weight', 'model.layers.14.self_attn.q_proj.bias', 'model.layers.14.self_attn.q_proj.weight', 'model.layers.14.self_attn.v_proj.bias', 'model.layers.14.self_attn.v_proj.weight', 'model.layers.15.self_attn.k_proj.bias', 'model.layers.15.self_attn.k_proj.weight', 'model.layers.15.self_attn.o_proj.bias', 'model.layers.15.self_attn.o_proj.weight', 'model.layers.15.self_attn.q_proj.bias', 'model.layers.15.self_attn.q_proj.weight', 'model.layers.15.self_attn.v_proj.bias', 'model.layers.15.self_attn.v_proj.weight', 'model.layers.16.self_attn.k_proj.bias', 'model.layers.16.self_attn.k_proj.weight', 'model.layers.16.self_attn.o_proj.bias', 'model.layers.16.self_attn.o_proj.weight', 'model.layers.16.self_attn.q_proj.bias', 'model.layers.16.self_attn.q_proj.weight', 'model.layers.16.self_attn.v_proj.bias', 'model.layers.16.self_attn.v_proj.weight', 'model.layers.17.self_attn.k_proj.bias', 'model.layers.17.self_attn.k_proj.weight', 'model.layers.17.self_attn.o_proj.bias', 'model.layers.17.self_attn.o_proj.weight', 'model.layers.17.self_attn.q_proj.bias', 'model.layers.17.self_attn.q_proj.weight', 'model.layers.17.self_attn.v_proj.bias', 'model.layers.17.self_attn.v_proj.weight', 'model.layers.18.self_attn.k_proj.bias', 'model.layers.18.self_attn.k_proj.weight', 'model.layers.18.self_attn.o_proj.bias', 'model.layers.18.self_attn.o_proj.weight', 'model.layers.18.self_attn.q_proj.bias', 'model.layers.18.self_attn.q_proj.weight', 'model.layers.18.self_attn.v_proj.bias', 'model.layers.18.self_attn.v_proj.weight', 'model.layers.19.self_attn.k_proj.bias', 'model.layers.19.self_attn.k_proj.weight', 'model.layers.19.self_attn.o_proj.bias', 'model.layers.19.self_attn.o_proj.weight', 'model.layers.19.self_attn.q_proj.bias', 'model.layers.19.self_attn.q_proj.weight', 'model.layers.19.self_attn.v_proj.bias', 'model.layers.19.self_attn.v_proj.weight', 'model.layers.2.self_attn.k_proj.bias', 'model.layers.2.self_attn.k_proj.weight', 'model.layers.2.self_attn.o_proj.bias', 'model.layers.2.self_attn.o_proj.weight', 'model.layers.2.self_attn.q_proj.bias', 'model.layers.2.self_attn.q_proj.weight', 'model.layers.2.self_attn.v_proj.bias', 'model.layers.2.self_attn.v_proj.weight', 'model.layers.20.self_attn.k_proj.bias', 'model.layers.20.self_attn.k_proj.weight', 'model.layers.20.self_attn.o_proj.bias', 'model.layers.20.self_attn.o_proj.weight', 'model.layers.20.self_attn.q_proj.bias', 'model.layers.20.self_attn.q_proj.weight', 'model.layers.20.self_attn.v_proj.bias', 'model.layers.20.self_attn.v_proj.weight', 'model.layers.21.self_attn.k_proj.bias', 'model.layers.21.self_attn.k_proj.weight', 'model.layers.21.self_attn.o_proj.bias', 'model.layers.21.self_attn.o_proj.weight', 'model.layers.21.self_attn.q_proj.bias', 'model.layers.21.self_attn.q_proj.weight', 'model.layers.21.self_attn.v_proj.bias', 'model.layers.21.self_attn.v_proj.weight', 'model.layers.22.self_attn.k_proj.bias', 'model.layers.22.self_attn.k_proj.weight', 'model.layers.22.self_attn.o_proj.bias', 'model.layers.22.self_attn.o_proj.weight', 'model.layers.22.self_attn.q_proj.bias', 'model.layers.22.self_attn.q_proj.weight', 'model.layers.22.self_attn.v_proj.bias', 'model.layers.22.self_attn.v_proj.weight', 'model.layers.23.self_attn.k_proj.bias', 'model.layers.23.self_attn.k_proj.weight', 'model.layers.23.self_attn.o_proj.bias', 'model.layers.23.self_attn.o_proj.weight', 'model.layers.23.self_attn.q_proj.bias', 'model.layers.23.self_attn.q_proj.weight', 'model.layers.23.self_attn.v_proj.bias', 'model.layers.23.self_attn.v_proj.weight', 'model.layers.24.self_attn.k_proj.bias', 'model.layers.24.self_attn.k_proj.weight', 'model.layers.24.self_attn.o_proj.bias', 'model.layers.24.self_attn.o_proj.weight', 'model.layers.24.self_attn.q_proj.bias', 'model.layers.24.self_attn.q_proj.weight', 'model.layers.24.self_attn.v_proj.bias', 'model.layers.24.self_attn.v_proj.weight', 'model.layers.25.self_attn.k_proj.bias', 'model.layers.25.self_attn.k_proj.weight', 'model.layers.25.self_attn.o_proj.bias', 'model.layers.25.self_attn.o_proj.weight', 'model.layers.25.self_attn.q_proj.bias', 'model.layers.25.self_attn.q_proj.weight', 'model.layers.25.self_attn.v_proj.bias', 'model.layers.25.self_attn.v_proj.weight', 'model.layers.26.self_attn.k_proj.bias', 'model.layers.26.self_attn.k_proj.weight', 'model.layers.26.self_attn.o_proj.bias', 'model.layers.26.self_attn.o_proj.weight', 'model.layers.26.self_attn.q_proj.bias', 'model.layers.26.self_attn.q_proj.weight', 'model.layers.26.self_attn.v_proj.bias', 'model.layers.26.self_attn.v_proj.weight', 'model.layers.27.self_attn.k_proj.bias', 'model.layers.27.self_attn.k_proj.weight', 'model.layers.27.self_attn.o_proj.bias', 'model.layers.27.self_attn.o_proj.weight', 'model.layers.27.self_attn.q_proj.bias', 'model.layers.27.self_attn.q_proj.weight', 'model.layers.27.self_attn.v_proj.bias', 'model.layers.27.self_attn.v_proj.weight', 'model.layers.28.self_attn.k_proj.bias', 'model.layers.28.self_attn.k_proj.weight', 'model.layers.28.self_attn.o_proj.bias', 'model.layers.28.self_attn.o_proj.weight', 'model.layers.28.self_attn.q_proj.bias', 'model.layers.28.self_attn.q_proj.weight', 'model.layers.28.self_attn.v_proj.bias', 'model.layers.28.self_attn.v_proj.weight', 'model.layers.29.self_attn.k_proj.bias', 'model.layers.29.self_attn.k_proj.weight', 'model.layers.29.self_attn.o_proj.bias', 'model.layers.29.self_attn.o_proj.weight', 'model.layers.29.self_attn.q_proj.bias', 'model.layers.29.self_attn.q_proj.weight', 'model.layers.29.self_attn.v_proj.bias', 'model.layers.29.self_attn.v_proj.weight', 'model.layers.3.self_attn.k_proj.bias', 'model.layers.3.self_attn.k_proj.weight', 'model.layers.3.self_attn.o_proj.bias', 'model.layers.3.self_attn.o_proj.weight', 'model.layers.3.self_attn.q_proj.bias', 'model.layers.3.self_attn.q_proj.weight', 'model.layers.3.self_attn.v_proj.bias', 'model.layers.3.self_attn.v_proj.weight', 'model.layers.4.self_attn.k_proj.bias', 'model.layers.4.self_attn.k_proj.weight', 'model.layers.4.self_attn.o_proj.bias', 'model.layers.4.self_attn.o_proj.weight', 'model.layers.4.self_attn.q_proj.bias', 'model.layers.4.self_attn.q_proj.weight', 'model.layers.4.self_attn.v_proj.bias', 'model.layers.4.self_attn.v_proj.weight', 'model.layers.5.self_attn.k_proj.bias', 'model.layers.5.self_attn.k_proj.weight', 'model.layers.5.self_attn.o_proj.bias', 'model.layers.5.self_attn.o_proj.weight', 'model.layers.5.self_attn.q_proj.bias', 'model.layers.5.self_attn.q_proj.weight', 'model.layers.5.self_attn.v_proj.bias', 'model.layers.5.self_attn.v_proj.weight', 'model.layers.6.self_attn.k_proj.bias', 'model.layers.6.self_attn.k_proj.weight', 'model.layers.6.self_attn.o_proj.bias', 'model.layers.6.self_attn.o_proj.weight', 'model.layers.6.self_attn.q_proj.bias', 'model.layers.6.self_attn.q_proj.weight', 'model.layers.6.self_attn.v_proj.bias', 'model.layers.6.self_attn.v_proj.weight', 'model.layers.7.self_attn.k_proj.bias', 'model.layers.7.self_attn.k_proj.weight', 'model.layers.7.self_attn.o_proj.bias', 'model.layers.7.self_attn.o_proj.weight', 'model.layers.7.self_attn.q_proj.bias', 'model.layers.7.self_attn.q_proj.weight', 'model.layers.7.self_attn.v_proj.bias', 'model.layers.7.self_attn.v_proj.weight', 'model.layers.8.self_attn.k_proj.bias', 'model.layers.8.self_attn.k_proj.weight', 'model.layers.8.self_attn.o_proj.bias', 'model.layers.8.self_attn.o_proj.weight', 'model.layers.8.self_attn.q_proj.bias', 'model.layers.8.self_attn.q_proj.weight', 'model.layers.8.self_attn.v_proj.bias', 'model.layers.8.self_attn.v_proj.weight', 'model.layers.9.self_attn.k_proj.bias', 'model.layers.9.self_attn.k_proj.weight', 'model.layers.9.self_attn.o_proj.bias', 'model.layers.9.self_attn.o_proj.weight', 'model.layers.9.self_attn.q_proj.bias', 'model.layers.9.self_attn.q_proj.weight', 'model.layers.9.self_attn.v_proj.bias', 'model.layers.9.self_attn.v_proj.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.

from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig

# to use 4bit use `load_in_4bit=True` instead
quantization_config = BitsAndBytesConfig()

checkpoint = "bigcode/starcoder2-3b"
tokenizer = AutoTokenizer.from_pretrained(checkpoint)
model = AutoModelForCausalLM.from_pretrained("finetune_starcoder2/final_checkpoint", quantization_config=quantization_config)

inputs = tokenizer.encode("hello_world_function <- function() {", return_tensors="pt").to("cuda")
outputs = model.generate(inputs)
print(tokenizer.decode(outputs[0]))

Also, I don't think doing 4-bit quantization as a default for finetuning is a good idea. It should be opt-in with a flag.

I am also wondering why do we use the Stack v1 for finetuning and not the Stack v2?

"lm_head.weight" not in the parameters of Starcoder2-3B and Starcoder2-7B (Huggingface version)

There is no lm_head.weight in the parameters of Starcoder2-3B and Starcoder2-7B. Is it because of tied embedding?

What prevents you from throughly opensourcing?

I noticed that even though bigcode/starcoder(2) is much opener than code llama and deepseekcoder, eg. open-sourced datasets, clearly described data processing and training, and so on, it is still not thoroughly open; code used for pretraining and data processing has never been open-source.
So just out of curiosity, what prevents you from that?

Inquiry about Fine-Tuning Using Custom Code

Hi there,

I hope this message finds you well. I am currently exploring the process of fine-tuning models using my own codebase, and I was hoping to seek some guidance on this matter.

Could you please provide me with information on how I can effectively fine-tune models using my own codebase? Additionally, would it be possible for you to share any scripts or resources related to data preprocessing for this purpose?

I truly appreciate any assistance or insights you can provide on this matter. Thank you very much for your time and support.

Best regards
@loubnabnl

support SPM mode for FIM prompts

from fim paper (https://arxiv.org/pdf/2207.14255.pdf) section 3.1: SPM mode can be used to reuse kv cache across completion requests.

SPM modes can enable further latency optimization (which is very important in case of code completion tools). is there any reason that startcoder models are using normal PSM mode?

Clash in requirements for finetuning Starcoder2

#Facing the following error while trying to finetune Starcoder2 with the given script.

Description:

For transformers.AutoModelForCausalLM to recognize Starcoder2 transformers>4.39.0 is required.

But trl is still using transformers==4.38.2. Even if I compile from source & use trl=0.7.12.dev0 I still get an issue.

Here is the error with using `transformers==4.38.2`

KeyError                                  Traceback (most recent call last)
[/usr/local/lib/python3.10/dist-packages/transformers/models/auto/configuration_auto.py](https://localhost:8080/#) in from_pretrained(cls, pretrained_model_name_or_path, **kwargs)
   1127             try:
-> 1128                 config_class = CONFIG_MAPPING[config_dict["model_type"]]
   1129             except KeyError:

4 frames
KeyError: 'starcoder2'

During handling of the above exception, another exception occurred:

ValueError                                Traceback (most recent call last)
[/usr/local/lib/python3.10/dist-packages/transformers/models/auto/configuration_auto.py](https://localhost:8080/#) in from_pretrained(cls, pretrained_model_name_or_path, **kwargs)
   1128                 config_class = CONFIG_MAPPING[config_dict["model_type"]]
   1129             except KeyError:
-> 1130                 raise ValueError(
   1131                     f"The checkpoint you are trying to load has model type `{config_dict['model_type']}` "
   1132                     "but Transformers does not recognize this architecture. This could be because of an "

ValueError: The checkpoint you are trying to load has model type `starcoder2` but Transformers does not recognize this architecture. This could be because of an issue with the checkpoint, or because your version of Transformers is out of date.

Here is the error when using `transformers==4.39.0`

ImportError                               Traceback (most recent call last)
[<ipython-input-2-3ef713ffd06d>](https://localhost:8080/#) in <cell line: 1>()
----> 1 from trl import SFTTrainer
      2 print("trl version:", trl.__version__)

1 frames
[/usr/local/lib/python3.10/dist-packages/trl/__init__.py](https://localhost:8080/#) in <module>
      3 __version__ = "0.7.12.dev0"
      4 
----> 5 from .core import set_seed
      6 from .environment import TextEnvironment, TextHistory
      7 from .extras import BestOfNSampler

[/usr/local/lib/python3.10/dist-packages/trl/core.py](https://localhost:8080/#) in <module>
     23 import torch.nn.functional as F
     24 from torch.nn.utils.rnn import pad_sequence
---> 25 from transformers import top_k_top_p_filtering
     26 
     27 from .import_utils import is_npu_available, is_xpu_available

ImportError: cannot import name 'top_k_top_p_filtering' from 'transformers' (/usr/local/lib/python3.10/dist-packages/transformers/__init__.py)

---------------------------------------------------------------------------
NOTE: If your import is failing due to a missing package, you can
manually install dependencies using either !pip or !apt.

To view examples of installing some common dependencies, click the
"Open Examples" button below.
---------------------------------------------------------------------------

Megatron model weights for StarCoder2-15B

A year ago, the raw Megatron weights for StarCoder were released.

Would it be possible to release the Megatron weights for StarCoder2, especially the 15B variant?

Also, publishing a script to convert StarCoder2 from Megatron format to HuggingFace format would be helpful. Thanks!

CrossCodeEval Results for StarCoder 2

Hi, currently I'm researching the impact of different retrieval-augmented generation (RAG) techniques on the LLM effect. We are attempting to replicate the CrossCodeEval from the "StarCoder 2 and The Stack v2: The Next Generation" paper as a baseline.

However, we have encountered issues in replicating the results stated in section 7.6.2 of the paper using the provided GitHub repository data and code for CrossCodeEval, along with the hyperparameters specified in the section. The paper reports a Code ES of 74.52 and an ID F1 of 68.81 for StarCoder2-7B’s Python code generation, whereas our replicated results showed a Code ES of 67.92 and an ID F1 of 58.08.

We noticed the option to use the BigCode-Evaluation-Harness for testing as mentioned in your repository, but we could not find the CrossCodeEval experiment within the bigcode-project/bigcode-evaluation-harness project. Therefore, we proceeded with the direct use of the open-source GitHub code and dataset for CrossCodeEval, employing the hyperparameters given in section 7.6.2.

My experiment evironment is:

A100 40G*8 DGX node
ubuntu 20.04
cuda 12.1
torch 2.1.2

Could you please provide any insights or additional guidelines that might help us better replicate the benchmark results? Any assistance or further details you could offer would be greatly appreciated.

Thank you for your time and support.

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.