Comments (5)
Hi
Maybe the partial-crf part makes this issue.
did you try with small batch size?
I didnot have problem with 16G GPU!
from dsner-pytorch.
How can I run the code in terminal?
I ran in "src" directory and other directories, but got error "ModuleNotFoundError: No module named 'src'". Did you run it in terminal or in IDE?
from dsner-pytorch.
The cpu memory ran out, not gpu. Below is output:
$ python dsner.py
PA+SL
100%|███████████████████████████████████████████████████████████████████████████| 1097/1097 [00:00<00:00, 146762.51it/s]
100%|███████████████████████████████████████████████████████████████████████████| 1097/1097 [00:00<00:00, 164115.83it/s]
100%|███████████████████████████████████████████████████████████████████████████| 1097/1097 [00:00<00:00, 857623.76it/s]
[2019-11-27 21:57:07,179] DEBUG:__main__:==> Size of train data : 1097
100%|█████████████████████████████████████████████████████████████████████████████| 798/798 [00:00<00:00, 773885.45it/s]
100%|█████████████████████████████████████████████████████████████████████████████| 798/798 [00:00<00:00, 895173.73it/s]
100%|█████████████████████████████████████████████████████████████████████████████| 798/798 [00:00<00:00, 902171.05it/s]
[2019-11-27 21:57:07,281] DEBUG:__main__:==> Size of test data : 798
100%|█████████████████████████████████████████████████████████████████████████████| 400/400 [00:00<00:00, 762947.52it/s]
100%|█████████████████████████████████████████████████████████████████████████████| 400/400 [00:00<00:00, 835518.73it/s]
100%|█████████████████████████████████████████████████████████████████████████████| 400/400 [00:00<00:00, 755730.45it/s]
[2019-11-27 21:57:07,339] DEBUG:__main__:==> Size of dev data : 400
100%|███████████████████████████████████████████████████████████████████████████| 2560/2560 [00:00<00:00, 782040.66it/s]
100%|███████████████████████████████████████████████████████████████████████████| 2560/2560 [00:00<00:00, 887756.78it/s]
100%|███████████████████████████████████████████████████████████████████████████| 2560/2560 [00:00<00:00, 924125.85it/s]
[2019-11-27 21:57:07,776] DEBUG:__main__:==> Size of ds pa data : 2560
[2019-11-27 21:57:07,968] DEBUG:__main__:==> Size of merge data : 3657
Training epoch 0: 0%|▎ | 16/3657 [00:00<05:45, 10.54it/s]/pytorch/aten/src/ATen/native/cuda/LegacyDefinitions.cpp:19: UserWarning: masked_fill_ received a mask with dtype torch.uint8, this behavior is now deprecated,please use a mask with dtype torch.bool instead.
/pytorch/aten/src/ATen/native/cuda/LegacyDefinitions.cpp:19: UserWarning: masked_fill_ received a mask with dtype torch.uint8, this behavior is now deprecated,please use a mask with dtype torch.bool instead.
......
......
......
Training epoch 0: 66%|███████████████████████████████████████▊ | 2430/3657 [01:30<03:05, 6.63it/s]Traceback (most recent call last):
File "dsner.py", line 387, in <module>
main()
File "dsner.py", line 294, in main
train_loss = trainer.train(dataset_setup, epoch)
File "/data/wangdsh/temp/DSNER-pytorch/src/trainer.py", line 76, in train
sent, tags, tags_iobes, sign, s_length, y_one_hot, y_iobes_one_hot = dataset[indices[start_index]]
File "/data/wangdsh/temp/DSNER-pytorch/src/dataset.py", line 64, in __getitem__
tags_iobes_one_hots=deepcopy(self.tags_iobes_one_hot[index])
File "/data/Anaconda/Anaconda3/lib/python3.6/copy.py", line 161, in deepcopy
y = copier(memo)
File "/data/Anaconda/Anaconda3/lib/python3.6/site-packages/torch/tensor.py", line 33, in __deepcopy__
new_storage = self.storage().__deepcopy__(memo)
File "/data/Anaconda/Anaconda3/lib/python3.6/site-packages/torch/storage.py", line 28, in __deepcopy__
new_storage = self.clone()
File "/data/Anaconda/Anaconda3/lib/python3.6/site-packages/torch/storage.py", line 44, in clone
return type(self)(self.size()).copy_(self)
RuntimeError: [enforce fail at CPUAllocator.cpp:64] . DefaultCPUAllocator: can't allocate memory: you tried to allocate 48272400 bytes. Error code 12 (Cannot allocate memory)
I changed batch_size to 16, but still got the same error. I think the reason is that variable "y_one_hot_all" consumes too much cpu memory.
from dsner-pytorch.
It is weird, I run it again and I don't have this problem, However I am using the previous version of Pytorch =1.0.1!
from dsner-pytorch.
Thanks for your response. I install pytorch 1.0.1 with conda and run the code again, but I encounter the same problem. I think it's not the pytorch version issue.
My test environment:
python: Python 3.6.9 :: Anaconda, Inc.
cuda: CUDA Version 9.2.148
pytorch: 1.0.1
Besides, I change args "--setup" to "A+H", "--mode" to "PA+SL" in dsner.py. All others are the same.
from dsner-pytorch.
Related Issues (5)
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from dsner-pytorch.