|
lycoris_sd = torch.load("Haku-Phi/version_None/lycoris_weight/epoch=3.pt") |
Using the same code, I cannot merge the weights. But I somehow managed to merge the lokr using the following code:
# the same setting to create the init lokr
LycorisNetwork.apply_preset(
{"target_name": [".*q_proj.*", ".*v_proj.*", ".*k_proj.*", ".*o_proj.*", ".*gate_proj.*", ".*up_proj.*", ".*down_proj.*"]}
)
lycoris_net = create_lycoris(
text_model,
multiplier=1.0,
linear_dim=100000,
linear_alpha=0,
factor=16,
algo="lokr"
)
# change the param to trained one
lycoris_net.apply_to()
lycoris_net = lycoris_net.to("cuda")
trained_lycoris_net = torch.load("./data/openbmb_MiniCPM-2B-128k/lycoris_weight_final.pt")
for name, param in trained_lycoris_net.items():
module_name, param_name = name.split(".")
temp = getattr(lycoris_net, module_name)
setattr(temp, param_name, nn.Parameter(param.to(torch.float32)))
setattr(lycoris_net, module_name, temp)
lycoris_net = lycoris_net.to(torch.bfloat16)
# merge to base model
lycoris_net.merge_to(1.0)
For this I can have a result different from original base model (which I think is reasonable based on my training).
But when I save the model to disk:
text_model.save_pretrained("./openbmb_MiniCPM-2B-128k-lokr")
tokenizer.save_pretrained("./openbmb_MiniCPM-2B-128k-lokr")
And then load it to inference, the result is different from the freshly merged one. But it is also different from the original base model.
As a reference, here is the inference code to show there is no random factor.
responds, history = text_model.chat(tokenizer, "ζδΉζ’ιΆθ‘?", max_length=256, temperature=None, top_p=None, do_sample=False)
print(responds)
Could you kindly tell me what I missed?