Comments (8)
It seems that you are using data parallel. Did you use multiple GPUs while using the single-GPU training code?
from simcse.
from simcse.
When i run run_unsup_example.sh and when i almost finished training, an error happend:
Traceback (most recent call last):
File "train.py", line 584, in
main()
File "train.py", line 548, in main
train_result = trainer.train(model_path=model_path)
File "/home/v-nuochen/SimCSE/simcse/trainers.py", line 464, in train
tr_loss += self.training_step(model, inputs)
File "/home/v-nuochen/.local/lib/python3.6/site-packages/transformers/trainer.py", line 1248, in training_step
loss = self.compute_loss(model, inputs)
File "/home/v-nuochen/.local/lib/python3.6/site-packages/transformers/trainer.py", line 1277, in compute_loss
outputs = model(**inputs)
File "/home/v-nuochen/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/v-nuochen/.local/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 162, in forward
return self.gather(outputs, self.output_device)
File "/home/v-nuochen/.local/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 174, in gather
return gather(outputs, output_device, dim=self.dim)
File "/home/v-nuochen/.local/lib/python3.6/site-packages/torch/nn/parallel/scatter_gather.py", line 68, in gather
res = gather_map(outputs)
File "/home/v-nuochen/.local/lib/python3.6/site-packages/torch/nn/parallel/scatter_gather.py", line 62, in gather_map
for k in out))
File "", line 6, in init
File "/home/v-nuochen/.local/lib/python3.6/site-packages/transformers/file_utils.py", line 1383, in post_init
for element in iterator:
File "/home/v-nuochen/.local/lib/python3.6/site-packages/torch/nn/parallel/scatter_gather.py", line 62, in
for k in out))
File "/home/v-nuochen/.local/lib/python3.6/site-packages/torch/nn/parallel/scatter_gather.py", line 55, in gather_map
return Gather.apply(target_device, dim, *outputs)
File "/home/v-nuochen/.local/lib/python3.6/site-packages/torch/nn/parallel/_functions.py", line 71, in forward
return comm.gather(inputs, ctx.dim, ctx.target_device)
File "/home/v-nuochen/.local/lib/python3.6/site-packages/torch/nn/parallel/comm.py", line 230, in gather
return torch._C._gather(tensors, dim, destination)RuntimeError: Input tensor at index 7 has invalid shape [2, 2], but expected [2, 9]
100%|█████████████████████████████████████████████████████████████████████████████████▉| 1953/1954 [18:36<00:00, 1.75it/s]Could you please tell me why?
I encounter the same problem.Could you tell me how to solve? thank you!
from simcse.
from simcse.
just use one gpu to train 发自我的iPhone
…
在 2021年7月16日,上午10:37,mode007 @.***> 写道: When i run run_unsup_example.sh and when i almost finished training, an error happend: Traceback (most recent call last): File "train.py", line 584, in main() File "train.py", line 548, in main train_result = trainer.train(model_path=model_path) File "/home/v-nuochen/SimCSE/simcse/trainers.py", line 464, in train tr_loss += self.training_step(model, inputs) File "/home/v-nuochen/.local/lib/python3.6/site-packages/transformers/trainer.py", line 1248, in training_step loss = self.compute_loss(model, inputs) File "/home/v-nuochen/.local/lib/python3.6/site-packages/transformers/trainer.py", line 1277, in compute_loss outputs = model(**inputs) File "/home/v-nuochen/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(*input, **kwargs) File "/home/v-nuochen/.local/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 162, in forward return self.gather(outputs, self.output_device) File "/home/v-nuochen/.local/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 174, in gather return gather(outputs, output_device, dim=self.dim) File "/home/v-nuochen/.local/lib/python3.6/site-packages/torch/nn/parallel/scatter_gather.py", line 68, in gather res = gather_map(outputs) File "/home/v-nuochen/.local/lib/python3.6/site-packages/torch/nn/parallel/scatter_gather.py", line 62, in gather_map for k in out)) File "", line 6, in init File "/home/v-nuochen/.local/lib/python3.6/site-packages/transformers/file_utils.py", line 1383, in post_init for element in iterator: File "/home/v-nuochen/.local/lib/python3.6/site-packages/torch/nn/parallel/scatter_gather.py", line 62, in for k in out)) File "/home/v-nuochen/.local/lib/python3.6/site-packages/torch/nn/parallel/scatter_gather.py", line 55, in gather_map return Gather.apply(target_device, dim, *outputs) File "/home/v-nuochen/.local/lib/python3.6/site-packages/torch/nn/parallel/_functions.py", line 71, in forward return comm.gather(inputs, ctx.dim, ctx.target_device) File "/home/v-nuochen/.local/lib/python3.6/site-packages/torch/nn/parallel/comm.py", line 230, in gather return torch._C._gather(tensors, dim, destination) RuntimeError: Input tensor at index 7 has invalid shape [2, 2], but expected [2, 9] 100%|█████████████████████████████████████████████████████████████████████████████████▉| 1953/1954 [18:36<00:00, 1.75it/s] Could you please tell me why? I encounter the same problem.Could you tell me how to solve? thank you! — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.
感谢回复!但是我没找到train.py哪里设置只是用单GPU,还请指点!感谢。
from simcse.
from simcse.
只在在运行sh文件上加上 export CUDA_VISIBLE_DEVICES= 1
…
2021年7月16日 上午11:11,mode007 @.> 写道: just use one gpu to train 发自我的iPhone … x-msg://30/# 在 2021年7月16日,上午10:37,mode007 @.> 写道: When i run run_unsup_example.sh and when i almost finished training, an error happend: Traceback (most recent call last): File "train.py", line 584, in main() File "train.py", line 548, in main train_result = trainer.train(model_path=model_path) File "/home/v-nuochen/SimCSE/simcse/trainers.py", line 464, in train tr_loss += self.training_step(model, inputs) File "/home/v-nuochen/.local/lib/python3.6/site-packages/transformers/trainer.py", line 1248, in training_step loss = self.compute_loss(model, inputs) File "/home/v-nuochen/.local/lib/python3.6/site-packages/transformers/trainer.py", line 1277, in compute_loss outputs = model(**inputs) File "/home/v-nuochen/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(*input, **kwargs) File "/home/v-nuochen/.local/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 162, in forward return self.gather(outputs, self.output_device) File "/home/v-nuochen/.local/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 174, in gather return gather(outputs, output_device, dim=self.dim) File "/home/v-nuochen/.local/lib/python3.6/site-packages/torch/nn/parallel/scatter_gather.py", line 68, in gather res = gather_map(outputs) File "/home/v-nuochen/.local/lib/python3.6/site-packages/torch/nn/parallel/scatter_gather.py", line 62, in gather_map for k in out)) File "", line 6, in init File "/home/v-nuochen/.local/lib/python3.6/site-packages/transformers/file_utils.py", line 1383, in post_init for element in iterator: File "/home/v-nuochen/.local/lib/python3.6/site-packages/torch/nn/parallel/scatter_gather.py", line 62, in for k in out)) File "/home/v-nuochen/.local/lib/python3.6/site-packages/torch/nn/parallel/scatter_gather.py", line 55, in gather_map return Gather.apply(target_device, dim, *outputs) File "/home/v-nuochen/.local/lib/python3.6/site-packages/torch/nn/parallel/_functions.py", line 71, in forward return comm.gather(inputs, ctx.dim, ctx.target_device) File "/home/v-nuochen/.local/lib/python3.6/site-packages/torch/nn/parallel/comm.py", line 230, in gather return torch._C._gather(tensors, dim, destination) RuntimeError: Input tensor at index 7 has invalid shape [2, 2], but expected [2, 9] 100%|█████████████████████████████████████████████████████████████████████████████████▉| 1953/1954 [18:36<00:00, 1.75it/s] Could you please tell me why? I encounter the same problem.Could you tell me how to solve? thank you! — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe. 感谢回复!但是我没找到train.py哪里设置只是用单GPU,还请指点!感谢。 — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#28 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/ANT4RQT7T7GHAVQIDUUOSELTX6PNBANCNFSM445WUKSQ.
ok,已解决,感谢
from simcse.
from simcse.
Related Issues (20)
- why mlp_only_train=True during unsupervised training? HOT 2
- [question] Pretrined sentence embeddings model fine tuning HOT 2
- 关于simcse build_index 的速度问题 HOT 6
- Difference in models between train and evaluation scripts. HOT 4
- TypeError: object of type 'IndexFlatIP' has no len() HOT 4
- model = SimCSE("princeton-nlp/sup-simcse-bert-base-uncased")调用 HOT 3
- Can I load saved index to GPU? HOT 2
- Can I replace the base model with longformer? Is it a simple replacement or does the code also need to be changed? Thank you for your answer. HOT 4
- Why divide by temp when calculating cosine similarity HOT 1
- ValueError: Mixed precision training with AMP or APEX (`--fp16`) can only be used on CUDA devices.
- 关于两次前向传播 HOT 2
- AttributeError: 'OurTrainingArguments' object has no attribute 'distributed_state' HOT 2
- setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (750,) + inhomogeneous part. HOT 2
- 关于 Supervised SimCSE 的 GPU Memory Usage HOT 2
- An error when max_seq_length is set too long HOT 1
- drpout HOT 2
- couldn't install cimcse HOT 7
- Question about the comparison of data augmentations's reproduce HOT 2
- The function ‘search’ only returns one result HOT 2
- when I run evaluation.py, the spearman is unusually low HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from simcse.