Comments (13)
I got around the error by reducing the batch_size_for_train
from 32 down to 8. Then I was able to run ./src/train.py
as expected on my setup.
Maybe it would be a good idea to put on README.md
python3 ./src/train.py -num_epochs 1 -batch_size_for_train 1 -batch_size_for_eval 1
and, similarily,
CUDA_VISIBLE_DEVICES=0,1 python3 ./src/train.py -num_epochs 1 -batch_size_for_train 1 -batch_size_for_eval 1 -cuda_devices 0,1
As it is said that ~3 GB CPU and ~1.1GB GPU are necessary for running script.
.
from zero-shot-entity-linking.
I have the same problem,It seems to use a single GPU training mode in encoders.
e.g. self.cuda_device = 0
and batch = nn_util.move_to_device(batch, self.cuda_device)
So even if use the CUDA_VISIBLE_DEVICES=0,1, In fact it used a singal GPU.
I want to konw how to use allennlp with multi GPU.Can you please help?
from zero-shot-entity-linking.
Only BiEncoderTopXRetriever
at utils.py
uses a single GPU.
train.py
calls Trainer
which uses all available GPUs by default.
I executed the command CUDA_VISIBLE_DEVICES=0,1 python3 ./src/train.py -num_epochs 1 -batch_size_for_train 1 -batch_size_for_eval 1 -cuda_devices 0,1
and then I had no problem training the model on multiple GPUs.
Did you forget to add -cuda_devices 0,1
at the end of your command?
from zero-shot-entity-linking.
I used this command
CUDA_VISIBLE_DEVICES=3,4,5 python3 train.py -num_epochs 1 -batch_size_for_train 8 -batch_size_for_eval 8 -cuda_devices 3,4,5
and Problem arise when Encoding all entites from title and description
experiment_logdir: ../src/experiment_logdir/201120_125315/ World american_football is now being loaded... 0%| | 0/1 [00:00<?, ?it/s]======Encoding all entites from title and description===== 0%| | 0/1 [00:13<?, ?it/s] Traceback (most recent call last):<01:03, 423.30it/s] File "train.py", line 190, in <module> main() File "train.py", line 83, in main hardNegativeSearcher.hardNegativesSearcherandSetter() File "/home/zhg/zqx/Zero-Shot-Entity-Linking-master/src/hardnegative_searcher.py", line 41, in hardNegativesSearcherandSetter dui2encoded_emb, duidx2encoded_emb = self.dui2EncoderEntityEmbReturner() File "/home/zhg/zqx/Zero-Shot-Entity-Linking-master/src/hardnegative_searcher.py", line 76, in dui2EncoderEntityEmbReturner duidx2encoded_emb = self.encodeAllEntitiesEncoder.encoding_all_entities() File "/home/zhg/zqx/Zero-Shot-Entity-Linking-master/src/encoders.py", line 129, in encoding_all_entities duidxs, embs = self._extract_cuidx_and_its_encoded_emb(batch) File "/home/zhg/zqx/Zero-Shot-Entity-Linking-master/src/encoders.py", line 141, in _extract_cuidx_and_its_encoded_emb out_dict = self.entity_encoder_wrapping_model(**batch) File "/home/zhg/anaconda3/envs/zsel/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in __call__ result = self.forward(*input, **kwargs) File "/home/zhg/zqx/Zero-Shot-Entity-Linking-master/src/model.py", line 108, in forward encoded_entites = self.entity_encoder(title_and_desc_concatnated_text=title_and_desc_concatnated_text) File "/home/zhg/anaconda3/envs/zsel/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in __call__ result = self.forward(*input, **kwargs) File "/home/zhg/zqx/Zero-Shot-Entity-Linking-master/src/encoders.py", line 46, in forward entity_emb = self.word_embedder(title_and_desc_concatnated_text) File "/home/zhg/anaconda3/envs/zsel/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in __call__ result = self.forward(*input, **kwargs) File "/home/zhg/anaconda3/envs/zsel/lib/python3.7/site-packages/allennlp/modules/text_field_embedders/basic_text_field_embedder.py", line 131, in forward token_vectors = embedder(*tensors, **forward_params_values) File "/home/zhg/anaconda3/envs/zsel/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in __call__ result = self.forward(*input, **kwargs) File "/home/zhg/anaconda3/envs/zsel/lib/python3.7/site-packages/allennlp/modules/token_embedders/pretrained_transformer_embedder.py", line 26, in forward return self.transformer_model(token_ids)[0] File "/home/zhg/anaconda3/envs/zsel/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in __call__ result = self.forward(*input, **kwargs) File "/home/zhg/anaconda3/envs/zsel/lib/python3.7/site-packages/pytorch_transformers/modeling_bert.py", line 715, in forward head_mask=head_mask) File "/home/zhg/anaconda3/envs/zsel/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in __call__ result = self.forward(*input, **kwargs) File "/home/zhg/anaconda3/envs/zsel/lib/python3.7/site-packages/pytorch_transformers/modeling_bert.py", line 437, in forward layer_outputs = layer_module(hidden_states, attention_mask, head_mask[i]) File "/home/zhg/anaconda3/envs/zsel/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in __call__ result = self.forward(*input, **kwargs) File "/home/zhg/anaconda3/envs/zsel/lib/python3.7/site-packages/pytorch_transformers/modeling_bert.py", line 417, in forward intermediate_output = self.intermediate(attention_output) File "/home/zhg/anaconda3/envs/zsel/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in __call__ result = self.forward(*input, **kwargs) File "/home/zhg/anaconda3/envs/zsel/lib/python3.7/site-packages/pytorch_transformers/modeling_bert.py", line 389, in forward hidden_states = self.intermediate_act_fn(hidden_states) File "/home/zhg/anaconda3/envs/zsel/lib/python3.7/site-packages/pytorch_transformers/modeling_bert.py", line 142, in gelu return x * 0.5 * (1.0 + torch.erf(x / math.sqrt(2.0))) RuntimeError: CUDA out of memory. Tried to allocate 3.72 GiB (GPU 0; 10.76 GiB total capacity; 9.73 GiB already allocated; 143.12 MiB free; 9.77 GiB reserved in total by PyTorch) 16%|#5 | 4999/31929 [00:13<01:14, 363.16it/s]
the code in encoders.py seems use singal GPU
from zero-shot-entity-linking.
There are two workarounds:
-
Use Docker containers. Execute
docker run
with the flag--gpus '"device=3,4,5"'
. In this way the GPUs 3, 4 and 5 will be mapped to 0, 1 and 2 inside your container. More information here. -
If Docker containers are not available on your machine or if you are not familiar with Docker, you can simply do:
-
Replace
self.cuda_device = 0
on line 201 of utils.py withself.cuda_device = 3
-
Replace
self.cuda_device = 0
on line 107 of encoders.py withself.cuda_device = 3
from zero-shot-entity-linking.
Thank you for your help
from zero-shot-entity-linking.
100%|##########| 440473133/440473133 [01:12<00:00, 6085356.67B/s]
Hello, I want to know why your speed is so fast. Mine is shown below.
===PARAMETERS===
debug False
bert_name bert-base-uncased
word_embedding_dropout 0.05
cuda_devices 0
allen_lazyload True
batch_size_for_train 32
batch_size_for_eval 8
hard_negatives_num 10
num_epochs 1
lr 1e-05
weight_decay 0
beta1 0.9
beta2 0.999
epsilon 1e-08
amsgrad False
max_title_len 12
max_desc_len 50
max_context_len_after_tokenize 100
add_mse_for_biencoder False
search_method indexflatip
add_hard_negatives True
metionPooling CLS
entityPooling CLS
dimentionReduction False
dimentionReductionToThisDim 300
extracted_first_token_for_description 100
extracted_first_token_for_title 16
dataset_dir ./data/
documents_dir ./data/documents/
mentions_dir ./data/mentions/
mentions_splitbyworld_dir ./data/mentions_split_by_world/
mention_leftandright_tokenwindowwidth 40
debugSampleNum 100000000
dir_for_each_world ./data/worlds/
experiment_logdir ./src/experiment_logdir/
===PARAMETERS END===
experiment_logdir: ./src/experiment_logdir/201217_102331/
61%|##########################################3 | 266586112/440473133 [12:25<02:27, 1178137.12B/s]
Is it depend on CPUs?
from zero-shot-entity-linking.
I ran these experiments on two Tesla V100 GPUs at a NVIDIA DGX-1 32GB Server.
So yes, it depends on your setup.
By the way, @DRosemei and @doudouzqx , please let me know if you succeed in your experiments with the code on this repository or the BLINK repository. Although I was able to run the code and train the model, I couldn't achieve the results I was looking for.
from zero-shot-entity-linking.
@ruanchaves I meet a trouble now. I have downloaded model named "bert-base-uncased" , but I don't know where to put it.
Errors are shown below:
Model name 'bert-base-uncased' was not found in model name list (bert-base-uncased, bert-large-uncased, bert-base-cased, bert-large-cased, bert-base-multilingual-uncased, bert-base-multilingual-cased, bert-base-chinese). We assumed 'https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-uncased.tar.gz' was a path or url but couldn't find any file associated to this path or url.
Traceback (most recent call last):
File "./src/train.py", line 131, in
main()
File "./src/train.py", line 44, in main
mention_encoder = Pooler_for_mention(args=opts, word_embedder=textfieldEmbedder)
File "/media/rose/Doc/projects/xiaofan/Zero-Shot-Entity-Linking/src/encoders.py", line 63, in init
self.bertpooler_sec2vec = BertPooler(pretrained_model=self.bert_weight_filepath)
File "/home/rose/anaconda3/envs/el/lib/python3.7/site-packages/allennlp/modules/seq2vec_encoders/bert_pooler.py", line 51, in init
self.pooler = model.pooler
AttributeError: 'NoneType' object has no attribute 'pooler'
from zero-shot-entity-linking.
@DRosemei Can you post the command you are trying to run? What are your arguments to python3 ./src/train.py
?
from zero-shot-entity-linking.
@ruanchaves Yes, I used python3 ./src/train.py -num_epochs 1
, and I could train it now after I put "bert-base-uncased" to ./src/
from zero-shot-entity-linking.
@ruanchaves I have completed 1 epoch, and I get final results below:
{
"entire_h1_percent": 20.28,
"entire_h10_percent": 42.88,
"entire_h50_percent": 54.42,
"entire_h64_percent": 55.96,
"entire_h100_percent": 59.440000000000005,
"entire_h500_percent": 71.00999999999999
}
The results are not so good. Have you ever trained more than 1 epoch?
from zero-shot-entity-linking.
Yes, I have already trained for some epochs. I couldn't achieve acceptable results.
from zero-shot-entity-linking.
Related Issues (6)
- Add script for encoding entity embeddings from dumped pre-trained model.
- [Feature Request] Upgrade to the latest AllenNLP version and allow gradient accumulation HOT 1
- [Feature Request] Code to use trained model for Entity Linking HOT 4
- Performance numbers, hyperparameters and comparison with BLINK
- [WIP] for allennlp 2.1.0
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from zero-shot-entity-linking.