Giter VIP home page Giter VIP logo

hakuphi's Introduction

Here is KohakuBlueLeaf UwU

Kohaku: A cute dragon girl

BlueLeaf: An undergraduate student in Taiwan

kohakublueleaf

  • πŸ”­ I’m currently working on LyCORIS

  • 🀝 I’m looking for help with HyperKohaku

  • πŸ’¬ Ask me about Python, NN, Web Crawler

  • πŸ“« How to reach me [email protected]

  • ⚑ Fun fact I never watched Lycoris-Recoil

Connect with me:

kblueleaf blueleaf ZwgFFT4bSy

GitRoll Profile Badge

kohakublueleaf

Β kohakublueleaf

trophy

Sponsorship

Buy Me A Coffee

paypal.me: https://www.paypal.com/paypalme/kblueleaf
BTC: 36VHoCKxgp2u3YWQ8gNMDQR3fT49S5sRtf
ETH: 0x8023c8c0a10a4da4e6746cbd238a8bc990fbba60
LTC: MCpMKubB8eeKPZ6LsfW9A7pJP23YLoLT9T

hakuphi's People

Contributors

kohakublueleaf avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar

hakuphi's Issues

Different result for merged model and saved merged model

lycoris_sd = torch.load("Haku-Phi/version_None/lycoris_weight/epoch=3.pt")

Using the same code, I cannot merge the weights. But I somehow managed to merge the lokr using the following code:

# the same setting to create the init lokr
LycorisNetwork.apply_preset(
    {"target_name": [".*q_proj.*", ".*v_proj.*", ".*k_proj.*", ".*o_proj.*", ".*gate_proj.*", ".*up_proj.*", ".*down_proj.*"]}
)
lycoris_net = create_lycoris(
    text_model, 
    multiplier=1.0, 
    linear_dim=100000, 
    linear_alpha=0, 
    factor=16,
    algo="lokr"
)
# change the param to trained one
lycoris_net.apply_to()
lycoris_net = lycoris_net.to("cuda")
trained_lycoris_net = torch.load("./data/openbmb_MiniCPM-2B-128k/lycoris_weight_final.pt")
for name, param in trained_lycoris_net.items():
    module_name, param_name = name.split(".")
    temp = getattr(lycoris_net, module_name)
    setattr(temp, param_name, nn.Parameter(param.to(torch.float32)))
    setattr(lycoris_net, module_name, temp)
lycoris_net = lycoris_net.to(torch.bfloat16)
# merge to base model
lycoris_net.merge_to(1.0)

For this I can have a result different from original base model (which I think is reasonable based on my training).
But when I save the model to disk:

text_model.save_pretrained("./openbmb_MiniCPM-2B-128k-lokr")
tokenizer.save_pretrained("./openbmb_MiniCPM-2B-128k-lokr")

And then load it to inference, the result is different from the freshly merged one. But it is also different from the original base model.
As a reference, here is the inference code to show there is no random factor.

responds, history = text_model.chat(tokenizer, "ζ€ŽδΉˆζŠ’ι“Άθ‘Œ?", max_length=256, temperature=None, top_p=None, do_sample=False)
print(responds)

Could you kindly tell me what I missed?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.