Giter VIP home page Giter VIP logo

Comments (5)

glample avatar glample commented on July 29, 2024 2

Ah yes, I also had this because I tried to reload on a single GPU a model trained on multiple GPU. Problem in that case is that with multi-GPU, the model is encapsulated in a module (this is why you have all the extra .module in the reloaded checkpoint parameters).

See 34825ea#diff-e750911d9404a6f817e2015251a4a654R458
I added a commented line. Comment out:
getattr(self, name).load_state_dict(data[name])
and uncomment:
getattr(self, name).load_state_dict({k[len('module.'):]: v for k, v in data[name].items()})
it should solve the issue.

from xlm.

glample avatar glample commented on July 29, 2024

34825ea
should do the trick. You still have to provide the parameters though. What you can do is simply copy paste the "running command" at the beginning of the train.log of the experiment with the checkpoint you want to reload, and simply add --reload_checkpoint EXP_PATH/checkpoint.pth

from xlm.

odel-odel avatar odel-odel commented on July 29, 2024

Thank you for the quick response.
Now I'm getting runtime error ;

Traceback (most recent call last):
File "train.py", line 330, in
main(params)
File "train.py", line 250, in main
trainer = SingleTrainer(model, data, params)
File "/NMT/XLM/src/trainer.py", line 704, in init
super().init(data, params)
File "/NMT/XLM/src/trainer.py", line 94, in init
self.reload_checkpoint()
File "/NMT/XLM/src/trainer.py", line 457, in reload_checkpoint
getattr(self, name).load_state_dict(data[name])
File "/usr/local/lib/python3.5/dist-packages/torch/nn/modules/module.py", line 769, in load_state_dict
self.class.name, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for TransformerModel:
Missing key(s) in state_dict: "position_embeddings.weight", "lang_embeddings.weight", "embeddings.weight", "layer_norm_emb.bias", "layer_norm_emb.weight", "attentions.0.q_lin.bias", "attentions.0.q_lin.weight", "attentions.0.k_lin.bias", "attentions.0.k_lin.weight", "attentions.0.v_lin.bias", "attentions.0.v_lin.weight", "attentions.0.out_lin.bias", "attentions.0.out_lin.weight", "attentions.1.q_lin.bias", "attentions.1.q_lin.weight", "attentions.1.k_lin.bias", "attentions.1.k_lin.weight", "attentions.1.v_lin.bias", "attentions.1.v_lin.weight", "attentions.1.out_lin.bias", "attentions.1.out_lin.weight", "attentions.2.q_lin.bias", "attentions.2.q_lin.weight", "attentions.2.k_lin.bias", "attentions.2.k_lin.weight", "attentions.2.v_lin.bias", "attentions.2.v_lin.weight", "attentions.2.out_lin.bias", "attentions.2.out_lin.weight", "attentions.3.q_lin.bias", "attentions.3.q_lin.weight", "attentions.3.k_lin.bias", "attentions.3.k_lin.weight", "attentions.3.v_lin.bias", "attentions.3.v_lin.weight", "attentions.3.out_lin.bias", "attentions.3.out_lin.weight", "attentions.4.q_lin.bias", "attentions.4.q_lin.weight", "attentions.4.k_lin.bias", "attentions.4.k_lin.weight", "attentions.4.v_lin.bias", "attentions.4.v_lin.weight", "attentions.4.out_lin.bias", "attentions.4.out_lin.weight", "attentions.5.q_lin.bias", "attentions.5.q_lin.weight", "attentions.5.k_lin.bias", "attentions.5.k_lin.weight", "attentions.5.v_lin.bias", "attentions.5.v_lin.weight", "attentions.5.out_lin.bias", "attentions.5.out_lin.weight", "layer_norm1.0.bias", "layer_norm1.0.weight", "layer_norm1.1.bias", "layer_norm1.1.weight", "layer_norm1.2.bias", "layer_norm1.2.weight", "layer_norm1.3.bias", "layer_norm1.3.weight", "layer_norm1.4.bias", "layer_norm1.4.weight", "layer_norm1.5.bias", "layer_norm1.5.weight", "ffns.0.lin1.bias", "ffns.0.lin1.weight", "ffns.0.lin2.bias", "ffns.0.lin2.weight", "ffns.1.lin1.bias", "ffns.1.lin1.weight", "ffns.1.lin2.bias", "ffns.1.lin2.weight", "ffns.2.lin1.bias", "ffns.2.lin1.weight", "ffns.2.lin2.bias", "ffns.2.lin2.weight", "ffns.3.lin1.bias", "ffns.3.lin1.weight", "ffns.3.lin2.bias", "ffns.3.lin2.weight", "ffns.4.lin1.bias", "ffns.4.lin1.weight", "ffns.4.lin2.bias", "ffns.4.lin2.weight", "ffns.5.lin1.bias", "ffns.5.lin1.weight", "ffns.5.lin2.bias", "ffns.5.lin2.weight", "layer_norm2.0.bias", "layer_norm2.0.weight", "layer_norm2.1.bias", "layer_norm2.1.weight", "layer_norm2.2.bias", "layer_norm2.2.weight", "layer_norm2.3.bias", "layer_norm2.3.weight", "layer_norm2.4.bias", "layer_norm2.4.weight", "layer_norm2.5.bias", "layer_norm2.5.weight", "pred_layer.proj.bias", "pred_layer.proj.weight".
Unexpected key(s) in state_dict: "module.position_embeddings.weight", "module.lang_embeddings.weight", "module.embeddings.weight", "module.layer_norm_emb.weight", "module.layer_norm_emb.bias", "module.attentions.0.q_lin.weight", "module.attentions.0.q_lin.bias", "module.attentions.0.k_lin.weight", "module.attentions.0.k_lin.bias", "module.attentions.0.v_lin.weight", "module.attentions.0.v_lin.bias", "module.attentions.0.out_lin.weight", "module.attentions.0.out_lin.bias", "module.attentions.1.q_lin.weight", "module.attentions.1.q_lin.bias", "module.attentions.1.k_lin.weight", "module.attentions.1.k_lin.bias", "module.attentions.1.v_lin.weight", "module.attentions.1.v_lin.bias", "module.attentions.1.out_lin.weight", "module.attentions.1.out_lin.bias", "module.attentions.2.q_lin.weight", "module.attentions.2.q_lin.bias", "module.attentions.2.k_lin.weight", "module.attentions.2.k_lin.bias", "module.attentions.2.v_lin.weight", "module.attentions.2.v_lin.bias", "module.attentions.2.out_lin.weight", "module.attentions.2.out_lin.bias", "module.attentions.3.q_lin.weight", "module.attentions.3.q_lin.bias", "module.attentions.3.k_lin.weight", "module.attentions.3.k_lin.bias", "module.attentions.3.v_lin.weight", "module.attentions.3.v_lin.bias", "module.attentions.3.out_lin.weight", "module.attentions.3.out_lin.bias", "module.attentions.4.q_lin.weight", "module.attentions.4.q_lin.bias", "module.attentions.4.k_lin.weight", "module.attentions.4.k_lin.bias", "module.attentions.4.v_lin.weight", "module.attentions.4.v_lin.bias", "module.attentions.4.out_lin.weight", "module.attentions.4.out_lin.bias", "module.attentions.5.q_lin.weight", "module.attentions.5.q_lin.bias", "module.attentions.5.k_lin.weight", "module.attentions.5.k_lin.bias", "module.attentions.5.v_lin.weight", "module.attentions.5.v_lin.bias", "module.attentions.5.out_lin.weight", "module.attentions.5.out_lin.bias", "module.layer_norm1.0.weight", "module.layer_norm1.0.bias", "module.layer_norm1.1.weight", "module.layer_norm1.1.bias", "module.layer_norm1.2.weight", "module.layer_norm1.2.bias", "module.layer_norm1.3.weight", "module.layer_norm1.3.bias", "module.layer_norm1.4.weight", "module.layer_norm1.4.bias", "module.layer_norm1.5.weight", "module.layer_norm1.5.bias", "module.ffns.0.lin1.weight", "module.ffns.0.lin1.bias", "module.ffns.0.lin2.weight", "module.ffns.0.lin2.bias", "module.ffns.1.lin1.weight", "module.ffns.1.lin1.bias", "module.ffns.1.lin2.weight", "module.ffns.1.lin2.bias", "module.ffns.2.lin1.weight", "module.ffns.2.lin1.bias", "module.ffns.2.lin2.weight", "module.ffns.2.lin2.bias", "module.ffns.3.lin1.weight", "module.ffns.3.lin1.bias", "module.ffns.3.lin2.weight", "module.ffns.3.lin2.bias", "module.ffns.4.lin1.weight", "module.ffns.4.lin1.bias", "module.ffns.4.lin2.weight", "module.ffns.4.lin2.bias", "module.ffns.5.lin1.weight", "module.ffns.5.lin1.bias", "module.ffns.5.lin2.weight", "module.ffns.5.lin2.bias", "module.layer_norm2.0.weight", "module.layer_norm2.0.bias", "module.layer_norm2.1.weight", "module.layer_norm2.1.bias", "module.layer_norm2.2.weight", "module.layer_norm2.2.bias", "module.layer_norm2.3.weight", "module.layer_norm2.3.bias", "module.layer_norm2.4.weight", "module.layer_norm2.4.bias", "module.layer_norm2.5.weight", "module.layer_norm2.5.bias", "module.pred_layer.proj.weight", "module.pred_layer.proj.bias".

from xlm.

odel-odel avatar odel-odel commented on July 29, 2024

Thanks !!

from xlm.

bhardwaj1230 avatar bhardwaj1230 commented on July 29, 2024

Ah yes, I also had this because I tried to reload on a single GPU a model trained on multiple GPU. Problem in that case is that with multi-GPU, the model is encapsulated in a module (this is why you have all the extra .module in the reloaded checkpoint parameters).

See 34825ea#diff-e750911d9404a6f817e2015251a4a654R458
I added a commented line. Comment out:
getattr(self, name).load_state_dict(data[name])
and uncomment:
getattr(self, name).load_state_dict({k[len('module.'):]: v for k, v in data[name].items()})
it should solve the issue.

This solved my issues, when I trained TLM on multi gpu's and translating using just 1 gpu.

from xlm.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.