Giter VIP home page Giter VIP logo

Comments (8)

gaotianyu1350 avatar gaotianyu1350 commented on September 14, 2024

It seems that you are using data parallel. Did you use multiple GPUs while using the single-GPU training code?

from simcse.

cn-boop avatar cn-boop commented on September 14, 2024

from simcse.

mode007 avatar mode007 commented on September 14, 2024

When i run run_unsup_example.sh and when i almost finished training, an error happend:

Traceback (most recent call last):
File "train.py", line 584, in
main()
File "train.py", line 548, in main
train_result = trainer.train(model_path=model_path)
File "/home/v-nuochen/SimCSE/simcse/trainers.py", line 464, in train
tr_loss += self.training_step(model, inputs)
File "/home/v-nuochen/.local/lib/python3.6/site-packages/transformers/trainer.py", line 1248, in training_step
loss = self.compute_loss(model, inputs)
File "/home/v-nuochen/.local/lib/python3.6/site-packages/transformers/trainer.py", line 1277, in compute_loss
outputs = model(**inputs)
File "/home/v-nuochen/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/v-nuochen/.local/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 162, in forward
return self.gather(outputs, self.output_device)
File "/home/v-nuochen/.local/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 174, in gather
return gather(outputs, output_device, dim=self.dim)
File "/home/v-nuochen/.local/lib/python3.6/site-packages/torch/nn/parallel/scatter_gather.py", line 68, in gather
res = gather_map(outputs)
File "/home/v-nuochen/.local/lib/python3.6/site-packages/torch/nn/parallel/scatter_gather.py", line 62, in gather_map
for k in out))
File "", line 6, in init
File "/home/v-nuochen/.local/lib/python3.6/site-packages/transformers/file_utils.py", line 1383, in post_init
for element in iterator:
File "/home/v-nuochen/.local/lib/python3.6/site-packages/torch/nn/parallel/scatter_gather.py", line 62, in
for k in out))
File "/home/v-nuochen/.local/lib/python3.6/site-packages/torch/nn/parallel/scatter_gather.py", line 55, in gather_map
return Gather.apply(target_device, dim, *outputs)
File "/home/v-nuochen/.local/lib/python3.6/site-packages/torch/nn/parallel/_functions.py", line 71, in forward
return comm.gather(inputs, ctx.dim, ctx.target_device)
File "/home/v-nuochen/.local/lib/python3.6/site-packages/torch/nn/parallel/comm.py", line 230, in gather
return torch._C._gather(tensors, dim, destination)

RuntimeError: Input tensor at index 7 has invalid shape [2, 2], but expected [2, 9]
100%|█████████████████████████████████████████████████████████████████████████████████▉| 1953/1954 [18:36<00:00, 1.75it/s]

Could you please tell me why?

I encounter the same problem.Could you tell me how to solve? thank you!

from simcse.

cn-boop avatar cn-boop commented on September 14, 2024

from simcse.

mode007 avatar mode007 commented on September 14, 2024

just use one gpu to train 发自我的iPhone

在 2021年7月16日,上午10:37,mode007 @.***> 写道:  When i run run_unsup_example.sh and when i almost finished training, an error happend: Traceback (most recent call last): File "train.py", line 584, in main() File "train.py", line 548, in main train_result = trainer.train(model_path=model_path) File "/home/v-nuochen/SimCSE/simcse/trainers.py", line 464, in train tr_loss += self.training_step(model, inputs) File "/home/v-nuochen/.local/lib/python3.6/site-packages/transformers/trainer.py", line 1248, in training_step loss = self.compute_loss(model, inputs) File "/home/v-nuochen/.local/lib/python3.6/site-packages/transformers/trainer.py", line 1277, in compute_loss outputs = model(**inputs) File "/home/v-nuochen/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(*input, **kwargs) File "/home/v-nuochen/.local/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 162, in forward return self.gather(outputs, self.output_device) File "/home/v-nuochen/.local/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 174, in gather return gather(outputs, output_device, dim=self.dim) File "/home/v-nuochen/.local/lib/python3.6/site-packages/torch/nn/parallel/scatter_gather.py", line 68, in gather res = gather_map(outputs) File "/home/v-nuochen/.local/lib/python3.6/site-packages/torch/nn/parallel/scatter_gather.py", line 62, in gather_map for k in out)) File "", line 6, in init File "/home/v-nuochen/.local/lib/python3.6/site-packages/transformers/file_utils.py", line 1383, in post_init for element in iterator: File "/home/v-nuochen/.local/lib/python3.6/site-packages/torch/nn/parallel/scatter_gather.py", line 62, in for k in out)) File "/home/v-nuochen/.local/lib/python3.6/site-packages/torch/nn/parallel/scatter_gather.py", line 55, in gather_map return Gather.apply(target_device, dim, *outputs) File "/home/v-nuochen/.local/lib/python3.6/site-packages/torch/nn/parallel/_functions.py", line 71, in forward return comm.gather(inputs, ctx.dim, ctx.target_device) File "/home/v-nuochen/.local/lib/python3.6/site-packages/torch/nn/parallel/comm.py", line 230, in gather return torch._C._gather(tensors, dim, destination) RuntimeError: Input tensor at index 7 has invalid shape [2, 2], but expected [2, 9] 100%|█████████████████████████████████████████████████████████████████████████████████▉| 1953/1954 [18:36<00:00, 1.75it/s] Could you please tell me why? I encounter the same problem.Could you tell me how to solve? thank you! — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.

感谢回复!但是我没找到train.py哪里设置只是用单GPU,还请指点!感谢。

from simcse.

cn-boop avatar cn-boop commented on September 14, 2024

from simcse.

mode007 avatar mode007 commented on September 14, 2024

只在在运行sh文件上加上 export CUDA_VISIBLE_DEVICES= 1

2021年7月16日 上午11:11,mode007 @.> 写道: just use one gpu to train 发自我的iPhone … x-msg://30/# 在 2021年7月16日,上午10:37,mode007 @.> 写道:  When i run run_unsup_example.sh and when i almost finished training, an error happend: Traceback (most recent call last): File "train.py", line 584, in main() File "train.py", line 548, in main train_result = trainer.train(model_path=model_path) File "/home/v-nuochen/SimCSE/simcse/trainers.py", line 464, in train tr_loss += self.training_step(model, inputs) File "/home/v-nuochen/.local/lib/python3.6/site-packages/transformers/trainer.py", line 1248, in training_step loss = self.compute_loss(model, inputs) File "/home/v-nuochen/.local/lib/python3.6/site-packages/transformers/trainer.py", line 1277, in compute_loss outputs = model(**inputs) File "/home/v-nuochen/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(*input, **kwargs) File "/home/v-nuochen/.local/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 162, in forward return self.gather(outputs, self.output_device) File "/home/v-nuochen/.local/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 174, in gather return gather(outputs, output_device, dim=self.dim) File "/home/v-nuochen/.local/lib/python3.6/site-packages/torch/nn/parallel/scatter_gather.py", line 68, in gather res = gather_map(outputs) File "/home/v-nuochen/.local/lib/python3.6/site-packages/torch/nn/parallel/scatter_gather.py", line 62, in gather_map for k in out)) File "", line 6, in init File "/home/v-nuochen/.local/lib/python3.6/site-packages/transformers/file_utils.py", line 1383, in post_init for element in iterator: File "/home/v-nuochen/.local/lib/python3.6/site-packages/torch/nn/parallel/scatter_gather.py", line 62, in for k in out)) File "/home/v-nuochen/.local/lib/python3.6/site-packages/torch/nn/parallel/scatter_gather.py", line 55, in gather_map return Gather.apply(target_device, dim, *outputs) File "/home/v-nuochen/.local/lib/python3.6/site-packages/torch/nn/parallel/_functions.py", line 71, in forward return comm.gather(inputs, ctx.dim, ctx.target_device) File "/home/v-nuochen/.local/lib/python3.6/site-packages/torch/nn/parallel/comm.py", line 230, in gather return torch._C._gather(tensors, dim, destination) RuntimeError: Input tensor at index 7 has invalid shape [2, 2], but expected [2, 9] 100%|█████████████████████████████████████████████████████████████████████████████████▉| 1953/1954 [18:36<00:00, 1.75it/s] Could you please tell me why? I encounter the same problem.Could you tell me how to solve? thank you! — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe. 感谢回复!但是我没找到train.py哪里设置只是用单GPU,还请指点!感谢。 — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#28 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/ANT4RQT7T7GHAVQIDUUOSELTX6PNBANCNFSM445WUKSQ.

ok,已解决,感谢

from simcse.

cn-boop avatar cn-boop commented on September 14, 2024

from simcse.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.