Giter VIP home page Giter VIP logo

Comments (5)

nooralahzadeh avatar nooralahzadeh commented on July 19, 2024

Hi
Maybe the partial-crf part makes this issue.
did you try with small batch size?
I didnot have problem with 16G GPU!

from dsner-pytorch.

wangdsh avatar wangdsh commented on July 19, 2024

How can I run the code in terminal?

I ran in "src" directory and other directories, but got error "ModuleNotFoundError: No module named 'src'". Did you run it in terminal or in IDE?

from dsner-pytorch.

wangdsh avatar wangdsh commented on July 19, 2024

The cpu memory ran out, not gpu. Below is output:

$ python dsner.py 
PA+SL
100%|███████████████████████████████████████████████████████████████████████████| 1097/1097 [00:00<00:00, 146762.51it/s]
100%|███████████████████████████████████████████████████████████████████████████| 1097/1097 [00:00<00:00, 164115.83it/s]
100%|███████████████████████████████████████████████████████████████████████████| 1097/1097 [00:00<00:00, 857623.76it/s]
[2019-11-27 21:57:07,179] DEBUG:__main__:==> Size of train data   : 1097 
100%|█████████████████████████████████████████████████████████████████████████████| 798/798 [00:00<00:00, 773885.45it/s]
100%|█████████████████████████████████████████████████████████████████████████████| 798/798 [00:00<00:00, 895173.73it/s]
100%|█████████████████████████████████████████████████████████████████████████████| 798/798 [00:00<00:00, 902171.05it/s]
[2019-11-27 21:57:07,281] DEBUG:__main__:==> Size of test data    : 798 
100%|█████████████████████████████████████████████████████████████████████████████| 400/400 [00:00<00:00, 762947.52it/s]
100%|█████████████████████████████████████████████████████████████████████████████| 400/400 [00:00<00:00, 835518.73it/s]
100%|█████████████████████████████████████████████████████████████████████████████| 400/400 [00:00<00:00, 755730.45it/s]
[2019-11-27 21:57:07,339] DEBUG:__main__:==> Size of dev data    : 400 
100%|███████████████████████████████████████████████████████████████████████████| 2560/2560 [00:00<00:00, 782040.66it/s]
100%|███████████████████████████████████████████████████████████████████████████| 2560/2560 [00:00<00:00, 887756.78it/s]
100%|███████████████████████████████████████████████████████████████████████████| 2560/2560 [00:00<00:00, 924125.85it/s]
[2019-11-27 21:57:07,776] DEBUG:__main__:==> Size of ds pa data    : 2560 
[2019-11-27 21:57:07,968] DEBUG:__main__:==> Size of merge  data : 3657 
Training epoch  0:   0%|▎                                                             | 16/3657 [00:00<05:45, 10.54it/s]/pytorch/aten/src/ATen/native/cuda/LegacyDefinitions.cpp:19: UserWarning: masked_fill_ received a mask with dtype torch.uint8, this behavior is now deprecated,please use a mask with dtype torch.bool instead.
/pytorch/aten/src/ATen/native/cuda/LegacyDefinitions.cpp:19: UserWarning: masked_fill_ received a mask with dtype torch.uint8, this behavior is now deprecated,please use a mask with dtype torch.bool instead.
......
......
......
Training epoch  0:  66%|███████████████████████████████████████▊                    | 2430/3657 [01:30<03:05,  6.63it/s]Traceback (most recent call last):
  File "dsner.py", line 387, in <module>
    main()
  File "dsner.py", line 294, in main
    train_loss = trainer.train(dataset_setup, epoch)
  File "/data/wangdsh/temp/DSNER-pytorch/src/trainer.py", line 76, in train
    sent, tags, tags_iobes, sign, s_length, y_one_hot, y_iobes_one_hot = dataset[indices[start_index]]
  File "/data/wangdsh/temp/DSNER-pytorch/src/dataset.py", line 64, in __getitem__
    tags_iobes_one_hots=deepcopy(self.tags_iobes_one_hot[index])
  File "/data/Anaconda/Anaconda3/lib/python3.6/copy.py", line 161, in deepcopy
    y = copier(memo)
  File "/data/Anaconda/Anaconda3/lib/python3.6/site-packages/torch/tensor.py", line 33, in __deepcopy__
    new_storage = self.storage().__deepcopy__(memo)
  File "/data/Anaconda/Anaconda3/lib/python3.6/site-packages/torch/storage.py", line 28, in __deepcopy__
    new_storage = self.clone()
  File "/data/Anaconda/Anaconda3/lib/python3.6/site-packages/torch/storage.py", line 44, in clone
    return type(self)(self.size()).copy_(self)
RuntimeError: [enforce fail at CPUAllocator.cpp:64] . DefaultCPUAllocator: can't allocate memory: you tried to allocate 48272400 bytes. Error code 12 (Cannot allocate memory)

I changed batch_size to 16, but still got the same error. I think the reason is that variable "y_one_hot_all" consumes too much cpu memory.

from dsner-pytorch.

nooralahzadeh avatar nooralahzadeh commented on July 19, 2024

It is weird, I run it again and I don't have this problem, However I am using the previous version of Pytorch =1.0.1!

from dsner-pytorch.

wangdsh avatar wangdsh commented on July 19, 2024

Thanks for your response. I install pytorch 1.0.1 with conda and run the code again, but I encounter the same problem. I think it's not the pytorch version issue.
My test environment:
python: Python 3.6.9 :: Anaconda, Inc.
cuda: CUDA Version 9.2.148
pytorch: 1.0.1

Besides, I change args "--setup" to "A+H", "--mode" to "PA+SL" in dsner.py. All others are the same.

from dsner-pytorch.

Related Issues (5)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.