facebookresearch / spanbert Goto Github PK

Code for using and evaluating SpanBERT.

License: Other

Python 99.96% Shell 0.04%

spanbert's Introduction

SpanBERT

This repository contains code and models for the paper: SpanBERT: Improving Pre-training by Representing and Predicting Spans. If you prefer to use Huggingface, please check out this link -- https://huggingface.co/SpanBERT

Requirements

Apex

Please use an earlier commit of Apex - NVIDIA/apex@4a8c4ac

Pre-trained Models

We release both base and large cased models for SpanBERT. The base & large models have the same model configuration as BERT but they differ in both the masking scheme and the training objectives (see our paper for more details).

SpanBERT (base & cased): 12-layer, 768-hidden, 12-heads , 110M parameters
SpanBERT (large & cased): 24-layer, 1024-hidden, 16-heads, 340M parameters

These models have the same format as the HuggingFace BERT models, so you can easily replace them with our SpanBET models. If you would like to use our fine-tuning code, the model paths are already hard-coded in the code :)

	SQuAD 1.1	SQuAD 2.0	Coref	TACRED
	F1	F1	avg. F1	F1
BERT (base)	88.5*	76.5*	73.1	67.7
SpanBERT (base)	92.4*	83.6*	77.4	68.2
BERT (large)	91.3	83.3	77.1	66.4
SpanBERT (large)	94.6	88.7	79.6	70.8

Note: The numbers marked as * are evaluated on the development sets because we didn't submit those models to the official SQuAD leaderboard. All the other numbers are test numbers.

Fine-tuning

SQuAD 1.1

python code/run_squad.py \
  --do_train \
  --do_eval \
  --model spanbert-base-cased \
  --train_file train-v1.1.json \
  --dev_file dev-v1.1.json \
  --train_batch_size 32 \
  --eval_batch_size 32  \
  --learning_rate 2e-5 \
  --num_train_epochs 4 \
  --max_seq_length 512 \
  --doc_stride 128 \
  --eval_metric f1 \
  --output_dir squad_output \
  --fp16

SQuAD 2.0

python code/run_squad.py \
  --do_train \
  --do_eval \
  --model spanbert-base-cased \
  --train_file train-v2.0.json \
  --dev_file dev-v2.0.json \
  --train_batch_size 32 \
  --eval_batch_size 32  \
  --learning_rate 2e-5 \
  --num_train_epochs 4 \
  --max_seq_length 512 \
  --doc_stride 128 \
  --eval_metric best_f1 \
  --output_dir squad2_output \
  --version_2_with_negative \
  --fp16

TACRED

python code/run_tacred.py \
  --do_train \
  --do_eval \
  --data_dir <TACRED_DATA_DIR> \
  --model spanbert-base-cased \
  --train_batch_size 32 \
  --eval_batch_size 32 \
  --learning_rate 2e-5 \
  --num_train_epochs 10 \
  --max_seq_length 128 \
  --output_dir tacred_dir \
  --fp16

MRQA (NewsQA, TriviaQA, SearchQA, HotpotQA, NaturalQuestions)

python code/run_mrqa.py \
  --do_train \
  --do_eval \
  --model spanbert-base-cased \
  --train_file TriviaQA-train.jsonl.gz \
  --dev_file TriviaQA-dev.jsonl.gz \
  --train_batch_size 32 \
  --eval_batch_size 32 \
  --learning_rate 2e-5 \
  --num_train_epochs 4 \
  --max_seq_length 512 \
  --doc_stride 128 \
  --eval_per_epoch 5 \
  --output_dir triviaqa_dir \
  --fp16

GLUE

python code/run_glue.py \
   --task_name RTE \
   --model spanbert-base-cased \
   --do_train \
   --do_eval \
   --data_dir <RTE_DATA_DIR> \
   --train_batch_size 32 \
   --eval_batch_size 32 \
   --num_train_epochs 10  \
   --max_seq_length 128 \
   --learning_rate 2e-5 \
   --output_dir RTE_DIR \
   --fp16

Coreference Resolution

Our coreference resolution fine-tuning code is implemented in Tensorflow. Please see https://github.com/mandarjoshi90/coref for more details.

Finetuned Models (SQuAD 1.1/2.0, Relation Extraction, Coreference Resolution)

If you are interested in using our fine-tuned models for downstream tasks, directly, please use the following script.

./code/download_finetuned.sh <model_dir> <task>

where <task> is one of [squad1, squad2, tacred]. You can evaluate the models by setting --do_train to false, --do_eval to true, and --output_dir to <model_dir>/<task> in python code/run_<task>.py.

For coreference resolution, please refer to this repository -- https://github.com/mandarjoshi90/coref

Citation

  @article{joshi2019spanbert,
      title={{SpanBERT}: Improving Pre-training by Representing and Predicting Spans},
      author={Mandar Joshi and Danqi Chen and Yinhan Liu and Daniel S. Weld and Luke Zettlemoyer and Omer Levy},
      journal={arXiv preprint arXiv:1907.10529},
      year={2019}
    }

License

SpanBERT is CC-BY-NC 4.0. The license applies to the pre-trained models as well.

Contact

If you have any questions, please contact Mandar Joshi <[email protected]> or Danqi Chen <[email protected]> or create a Github issue.

spanbert's People

Contributors

Stargazers

Watchers

Forkers

github30 bcmi220 zxlzr mandarjoshi90 linhr000 misoknisky yucoian aistudentsh wushanzha mkardas clger bvy007 clark78118 austin362667 mickael-van-der-beek cadae yamingpeng100 douxiaotian davletov-aa zhoubay jingyiwang3 intuitionmachine giocatori86 liguiming77 jibin5167 dragomirradev yanglijun960703 burakakrishna chsuong cosecant-csc cbiehl debuluoyi jackcheng8668 sam3oh5 nanciwan bytes-as jeonsworld lausama dshaprin ntnguyen1234 zhiliwang sirajsandhu twjiang itsmemala pvcastro xiangju2017 mrm8488 ldruth28 nelson-liu kunpeng199494 cserxy mindojune rogervaas shyamalschandra guome xu-yijie jaeyun95 vishwajeet93 nlngh gstoica27 ruiwangus asifm037 liuxinglan jhshen95 limingdeng danfu09 ren98feng oushu1zhangxiangxuan1 eric9612 xiaoanshi sushantakpani zeta1999 philip30 gilmoright nikolausn scape1989 mbladra bidishasamantakgp jundeli zixinzeng-jennifer congning0515 qianrenjian shiroganetsumugi tguens zhihongshao hanguangmic zjp9574 gkaramanolakis maoares judepark96 andreabac3 gongchuanyang penil93 jessieho96 daniel-huang-1230 sbjspsz qanastek robertaaa jaeheeryu ada520

spanbert's Issues

Apex ModuleNotFoundError: 'fused_adam_cuda'

I am using the same python packages as specified in requirements.txt and README (apex@4a8c4ac).
Python version 3.6.10
I get the following error after running the SQuAD 2.0 command about 10 minutes in.

05/06/2020 08:17:34 - INFO - pytorch_pretrained_bert.modeling - Weights of BertForQuestionAnswering not initialized from pretrained model: ['bert.pooler.dense.weight', 'bert.pooler.dense.bias', 'qa_outputs.weight', 'qa_outputs.bias']
Traceback (most recent call last):
  File "code/run_squad.py", line 1138, in <module>
    main(args)
  File "code/run_squad.py", line 947, in main
    max_grad_norm=1.0)
  File "/home/user/.conda3/envs/spanbert/lib/python3.6/site-packages/apex-0.1-py3.6.egg/apex/optimizers/fused_adam.py", line 40, in __init__
  File "/home/user/.conda3/envs/spanbert/lib/python3.6/importlib/__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 994, in _gcd_import
  File "<frozen importlib._bootstrap>", line 971, in _find_and_load
  File "<frozen importlib._bootstrap>", line 953, in _find_and_load_unlocked
ModuleNotFoundError: No module named 'fused_adam_cuda'

training model with own data

Hi, I am looking to train this model on my own data which have different relation classes.
Currently I am a bit confused with the data format you have used for training and evaluating the result. I am able to run this code but have doubt regarding different terminologies used in code and if I have formatted my data correctly according to them.
Can you please share an example dataset file having 5-10 records. I would like to run on this datset and be sure.

TACRED train F1

I'm confused about how SpanBERT fit the training set.

I replaced this line:
https://github.com/facebookresearch/SpanBERT/blob/master/code/run_tacred.py#L337

eval_examples = processor.get_dev_examples(args.data_dir)

with

eval_examples = processor.get_train_examples(args.data_dir)

The training F1 is very low

accuracy = 0.8149991192531266
eval_loss = 1.294708618564385
f1 = 0.09820465025017167
n_correct = 1001
n_gold = 13012
n_pred = 7374
num = 68124
precision = 0.13574721996202874
recall = 0.07692898862588379

Request for the full model file and code for MLM prediction using SBO head

I've found that you have shared the large model file with LM/SBO head here (https://dl.fbaipublicfiles.com/fairseq/models/spanbert_large_with_head.tar.gz); would you also provide the file for the base model?

Also, would you kindly provide a code that performs MLM prediction using SBO head (similar to BertPairTargetPredictionHead at pretraining/fairseq/models/pair_bert.py)? I'm curious about how it compares to the standard MLM.

Weights of BertForQuestionAnswering not initialized from pretrained model?

Hey Guys,

I am trying to retrain on SQUAD 2.0 on google colab. Got a message below:

10/17/2019 15:12:24 - INFO - pytorch_pretrained_bert.modeling - Weights of BertForQuestionAnswering not initialized from pretrained model: ['bert.pooler.dense.weight', 'bert.pooler.dense.bias', 'qa_outputs.weight', 'qa_outputs.bias']
10/17/2019 15:12:24 - INFO - main - Start epoch #0 (lr = 2e-05)...
^C

To reproduce my above:

!python ./code/run_squad.py
--do_train
--do_eval
--model spanbert-large-cased
--train_file train-v2.0.json
--dev_file dev-v2.0.json
--train_batch_size 32
--eval_batch_size 32
--learning_rate 2e-5
--num_train_epochs 4
--max_seq_length 512
--doc_stride 128
--eval_metric best_f1
--output_dir ./squad2_output
--version_2_with_negative \

I checked modelling.py, message is from line 647. It seems there are missing_keys about TF weights? Can anyone give an instruction?

Thanks.
Luke

Prepare training data

I tried to train using run_mrqa.py with the json files (by changing the read files codes). However, I am unable to proceed further. Any help would be appreciated. Please share the preparation for HotpotQA dataset using SpanBERT.

Is there any plan to release the code to train SpanBERT to get the pretrain model?

Is there any plan to release the code to train SpanBERT? Thank you

No vocab.txt in the model downloads?

There does not appear to be a vocab.txt file in the base (or large) model download? Is this expected?

Merging with transformers library -- Random results

I just wanted to check if you are planning on merging this with the transformers library since that would make it immensely easier to use. For now, just extracting the archive and calling the Bert APIs has worked for me but the solution feels inelegant as I have to download the model separately. Surprisingly, loading the BertModel using this codebase and the tokenizer using the transformers library results in random results across runs (with the same batch order and content). Can't really point out where the issue is for now!

Release of tensorflow checkpoints

Nice work! I am wondering if you could release the TensorFlow checkpoints of the spanBERT model.
Because training the spanBERT baseline requires the pre-trained version of spanBERT.

Any plan to release "Our BERT"

Hi,

In the paper, there is reference to "Our BERT-1seq", which outperforms Google's BERT by a healthy margin on GLUE.

I am just wondering if the pre-trained weights of this model can be made available, similar to SpanBERT base and SpanBERT large?

Are you going to release the code to train SpanBERT from scratch?

Migrating to HuggingFace's Transformers

I'm trying to convert the run_tacred.ty to a Huggingface's Transformers compatible version. I load the SpanBERT fine-tuned model for tacred and tokenizer using the following code:

from transformers import AutoTokenizer, AutoModel

tokenizer = AutoTokenizer.from_pretrained("SpanBERT/spanbert-base-cased")
model = AutoModel.from_pretrained("SpanBERT/spanbert-base-cased")

But at the evaluation time, when I get to the following section:

with torch.no_grad():
     logits = model(input_ids, segment_ids, input_mask)
loss_fct = CrossEntropyLoss()
tmp_eval_loss = loss_fct(logits.view(-1, num_labels), label_ids.view(-1))

I get the following error:

AttributeError: 'tuple' object has no attribute 'view'

which means the output of the model, logits, is not a torch tensor. I'm trying to figure out what the correct size of logits is for the loss_fct function. logits has two elements where each of them is torch.Size([32, 128, 1024]) where 32 is the batch size. label_ids here is also torch.Size([32]).

Based on the dimensions, I tried the following for the loss_fct based on the acceptable input dimension for loss_fct:

tmp_eval_loss = loss_fct(logits[0].view(list(input_ids.size())[0], -1), label_ids.view(-1))

which even though passes the computing loss function step, it throws an error in compute_f1 since we'll have the preds of size (n, 1024) where n is the total number of data points we are evaluating while the labels's size is n.

I wonder how should I set the dimensions?

spanbert_hf_base.tar.gz not found in cache

When I run run_glue.py I get:

pytorch_pretrained_bert1.file_utils - https://dl.fbaipublicfiles.com/fairseq/models/spanbert_hf_base.tar.gz not found in cache, downloading to /tmp/tmp500p59qt
6%|█▋ | 12869632/198960653 [00:04<00:44, 4196.59B/s]

I found a .pytorch_pretrained_bert folder in my /home/.

It is too slow for me to download this in my terminal. so I download "spanbert_hf_base.tar.gz" alone and put it in .pytorch_pretrained_bert folder and even extract it in this folder. I think I could not download in terminal, but run again, still occur these codes above, so where should I put the "spanbert_hf_base.tar.gz" ?

800 tokens

Hello! Can you please explain why you choose a limit in 800 tokens? And how you tokenize text - you have only 800 tokens of the entire dataset, or you processed each record with a new tokenizer instance and built fresh vocab?

pre-training code

First thanks your work.
If you plan to release the pre-training code for SpanBERT in the future?

Could you offer the function to generate a '' tag_map''?

Could you offer the function to generate a '' tag_map''?
I wanna try the ''ner_span'' scheme but failed to generate the tag_map

TypeError: Class advice impossible in Python3. Use the @implementer class decorator instead.

Does the code not support python3?

Miss SBO head parameters

Hi, I‘m doing a research about relation extraction. I think SpanBert is perfect for my method, but when I used MLM head or SBO head for transfer learning, I found that the pretrained model loses some parameters. So I want to know if it is possible to get the complete PTM parameters. Thanks so much!

Performance when fine-tuning on TACRED

Hi, thanks for the great work!

I'm trying to replicate the relation extraction results on TACRED but haven't succeeded yet. Currently I'm unable to achieve an F1 score above 67.2, so I want to clarify a few things.

NVIDIA Apex

The first issue (which applies to all tasks) regards NVIDIA Apex. Due to Apex transitioning to the new amp API, commit #512 moved FP16_Optimizer and FusedAdam to the apex.contrib module. Users that compiled Apex from the latest master and want to use SpanBERT's --fp16 get an exception "Please install apex from [...]", because the import of FP16_Optimizer and FusedAdam fails (e.g. in run_tacred.py).

As a temporary solution, I resorted to an earlier commit (#424) of Apex without these breaking changes. Can you specify the commit you used to compile Apex?

Fine-tuning

For fine-tuning on TACRED, I used the following configuration (based on the README and what I gathered from the paper).

python code/run_tacred.py \
  --do_train \
  --do_eval \
  --data_dir <TACRED_DATA_DIR> \
  --model spanbert-base-cased \
  --train_batch_size 32 \
  --eval_batch_size 32 \
  --learning_rate 2e-5 \
  --num_train_epochs 10 \
  --max_seq_length 128 \
  --output_dir tacred_dir \
  --fp16

Namespace object written to the training logs:

Namespace(data_dir='../relex-data/tacred/', do_eval=True, do_lower_case=False, do_train=True, eval_batch_size=32, eval_metric='f1', eval_per_epoch=10, eval_test=False, feature_mode='ner', fp16=True, gradient_accumulation_steps=1, learning_rate=2e-05, loss_scale=0, max_seq_length=128, model='spanbert-base-cased', negative_label='no_relation', no_cuda=False, num_train_epochs=10.0, output_dir='model_save', seed=42, train_batch_size=32, train_mode='random_sorted', warmup_proportion=0.1)

eval_results.txt

accuracy = 0.8535636958154743
batch_size = 32
epoch = 8
eval_loss = 0.7800797500179313
f1 = 0.6727970716900277
global_step = 18728
learning_rate = 2e-05
precision = 0.6535993061578491
recall = 0.6931567328918322

Test set results (test_results.txt, after running run_tacred.py with eval_test=True):

accuracy = 0.8534311342848305
eval_loss = 0.7800683651940298
f1 = 0.6725
precision = 0.6533657182512145
recall = 0.6927888153053716

Can you confirm that this is the same configuration used to produce the results stated in the paper?
Are there plans to make the fine-tuned TACRED model publicly available?

embedding out of index

Traceback (most recent call last):
File "D:/workspace/semal_propaganda/SpanBERT-master/code/run_squad.py", line 1138, in
main(args)
File "D:/workspace/semal_propaganda/SpanBERT-master/code/run_squad.py", line 971, in main
loss = model(input_ids, segment_ids, input_mask, start_positions, end_positions)
File "C:\Users\jli284.conda\envs\pro\lib\site-packages\torch\nn\modules\module.py", line 541, in call
result = self.forward(*input, **kwargs)
File "D:\workspace\semal_propaganda\SpanBERT-master\code\pytorch_pretrained_bert\modeling.py", line 1199, in forward
sequence_output, _ = self.bert(input_ids, token_type_ids, attention_mask, output_all_encoded_layers=False)
File "C:\Users\jli284.conda\envs\pro\lib\site-packages\torch\nn\modules\module.py", line 541, in call
result = self.forward(*input, **kwargs)
File "D:\workspace\semal_propaganda\SpanBERT-master\code\pytorch_pretrained_bert\modeling.py", line 730, in forward
embedding_output = self.embeddings(input_ids, token_type_ids)
File "C:\Users\jli284.conda\envs\pro\lib\site-packages\torch\nn\modules\module.py", line 541, in call
result = self.forward(*input, **kwargs)
File "D:\workspace\semal_propaganda\SpanBERT-master\code\pytorch_pretrained_bert\modeling.py", line 268, in forward
words_embeddings = self.word_embeddings(input_ids)
File "C:\Users\jli284.conda\envs\pro\lib\site-packages\torch\nn\modules\module.py", line 541, in call
result = self.forward(*input, **kwargs)
File "C:\Users\jli284.conda\envs\pro\lib\site-packages\torch\nn\modules\sparse.py", line 114, in forward
self.norm_type, self.scale_grad_by_freq, self.sparse)
File "C:\Users\jli284.conda\envs\pro\lib\site-packages\torch\nn\functional.py", line 1484, in embedding
return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
RuntimeError: index out of range: Tried to access index 29483 out of table with 28995 rows. at C:\w\1\s\tmp_conda_3.6_171155\conda\conda-bld\pytorch_1570813991702\work\aten\src\TH/generic/THTensorEvenMoreMath.cpp:418

Are the pre-trained models on this repository "Multilingual" versions of SpanBERT or only English? if not, is there any multilingual version available to download?

Are the pre-trained models on this repository "Multilingual" versions of SpanBERT or only English? if not, is there any multilingual version available?

Translate into tensorflow ckpt model

hello author:
i want to use spanbert for my task, but my task code build with tensorflow, spanbert is a pytorch.bin file. How to make change to get ckpt tensorflow model. :)

best wish.

Original Pre-trained Model

Hey I am wondering if it is possible to release the original pretrained model with some parameters of LM head prediction, such as linear layers, instead of the current version (drop those parameters off).

which vocab.txt to use?

For bert vocab size is 30522
but spanbert's config says its 28996
where to get the file?

Error: KeyError: 'token' while train

Hi ,

I am getting above error while training my data using run_tacred.py. It seems the program expects sentences broken in tokens. Is it correct? If yes then what tokenizer should I use to create tokens out of my sentences.
This is how a row in my data looks like:
[
{
"id": "1",
"sentence": "Pandit worked at the brokerage Morgan Stanley for about 11 years until 2005, when he and some Morgan Stanley colleagues quit and later founded the hedge fund Old Lane Partners",
"relation": "org:founded_by",
"subj_start": 0,
"subj_end": 5,
"obj_start": 158,
"obj_end": 174,
"subj_type": "PER",
"obj_type": "ORG"
}
]

What is the best fine tuning of SpanBERT for Question Answering task?

5 fine tuning examples are provided in the repository - SQuAD 1.1, SQuAD 2.0, TACRED, MRQA and GLUE. I am planning to use SpanBERT for Question Answering task. I tested the SQuAD 2.0 version and got excellent results. If I train SpanBERT on all 5 datasets, do you think that will make the model better on Question Answering? Or if I train on SQuAD 1.1, SQuAD 2.0, and MRQA, will that make the model better? Or do you recommend pretraining on just 1 dataset on top of SpanBERT? Any recommendation is much appreciated.

special_tokens at training time for TACRED

I think it would be useful to have access to the special_tokens dict at training in the same order used on the pretrained model for TACRED, since the original dataset is not free one cannot use the pretrained model and assign the same tokens to the NER labels without the original dataset.

Any plan to release SpanBERT for other languages?

Is it possible to release the code for pretraining data creation?

Hi, I wonder if it's possible to release your pretraining data creation code described in your paper?

How to run spanbert with huggingface library ?

Loading SpanBERT pre-trained model using the transformers package

I have some questions regarding SpanBert loading using the transformers package.

I downloaded the pre-trained file from SpanBert GitHub Repo and vocab.txt from Bert. Here is the code I used for loading:

model = BertModel.from_pretrained(config_file=config_file,
                                  pretrained_model_name_or_path=model_file,
                                  vocab_file=vocab_file)
model.to("cuda")

where

config_file -> config.json
model_file -> pytorch_model.bin
vocab_file -> vocab.txt

But I got the UnicodeDecoderError with the above code saying that 'utf-8' codec can't decode byte 0x80 in position 0: invalid start byte

I also tried loading SpanBert with the method mentioned here. But it returned OSError: file SpanBERT/spanbert-base-cased not found.

Do you have any suggestions on loading the pre-trained model correctly? Any suggestions are much appreciated. Thanks!

Incompatibility Doubt

So although the model is more or less compatible with transformers/pytorch_transformers, when loading directly, it gives a warning saying it does not have bertPooler.weight and bertPoolder.bias, how much would this affect performance on downstream tasks?

Multilingual Model

Congrats on the great results using BERT!

Does it Work on FP16 with latest nvidia amp？

Multilingual Model

Congrats on the great results using BERT!

I wanted to ask, if you plan to release a multilingual model in the future?

Are the parameters of spanbert-base the same as those of spanbert-large on TACRED?

Thanks for this amazing code!

I'm reproducing the performance of spanbert on tacred. I can almost reproduce the performance of large model, but the performance of base model is much worse than you reported:

 12/30/2020 20:46:06 - INFO - __main__ - ***** Eval results *****
 12/30/2020 20:46:06 - INFO - __main__ -   accuracy = 0.8563474879589943
 12/30/2020 20:46:06 - INFO - __main__ -   eval_loss = 0.7147518847621767
 12/30/2020 20:46:06 - INFO - __main__ -   f1 = 0.6635583737411413
 12/30/2020 20:46:06 - INFO - __main__ -   precision = 0.6728441754916793
 12/30/2020 20:46:06 - INFO - __main__ -   recall = 0.6545253863134658

So, I wonder, are the parameters of spanbert-base the same as those of spanbert-large on TACRED?

model load error

https://github.com/facebookresearch/SpanBERT/blob/master/code/pytorch_pretrained_bert/modeling.py#L616

    for key in state_dict.keys():
        # new_key = key[8:] if key.startswith("decoder.") else key
        if 'gamma' in new_key:
            new_key = new_key.replace('gamma', 'weight')
        if 'beta' in new_key:
            new_key = new_key.replace('beta', 'bias')
        if key != new_key:
            old_keys.append(key)
            new_keys.append(new_key)

I check the parameters, there is no parameters for decoder (prediction layer), these code should be removed.

[QUESTION] What type of relation extraction is SpanBERT suited for?

SpanBERT is fine-tuned and tested on TACRED where relation spans are mostly subjects and objects in a sentence with usually not more than 1-2 words in each span. Also, based on Figure 2 in the paper, the majority of the spans for pre-training SpanBERT are between 1-3 words in length. We have Relation Extraction tasks though, in which spans may be an event or even a complete sentence. I wonder if you think SpanBERT would be a good fit for such relation extraction tasks? One example, in the following document:

Kanner's reuse of "autism" led to decades of confused terminology like "infantile schizophrenia", and child psychiatry's focus on maternal deprivation led to misconceptions of autism as an infant's response to "refrigerator mothers".

There's a causal relation between (Kanner's reuse of "autism", decades of confused terminology like "infantile schizophrenia") where spans are longer than what we usually see in datasets such as TACRED (some might argue that these spans can be pruned and shortened, but my point is if overall SpanBERT is suitable for longer spans in a relation extraction task)

unexpected keyword argument `max_grad_norm` in `run_squad.py`

Training SQuAD 2.0 with the command given in README.md raises following error

Traceback (most recent call last):
  File "code/run_squad.py", line 1138, in <module>
    main(args)
  File "code/run_squad.py", line 947, in main
    max_grad_norm=1.0)
TypeError: __init__() got an unexpected keyword argument 'max_grad_norm'

According to https://nvidia.github.io/apex/optimizers.html FusedAdam doesn't take a max_grad_norm argument.

coref part is a mess

Hi
While I really appreciate the code for all the experiments are very clean, the coref experiment code is just a mess, there is a a lot of files with no structure, literally nothing just pushed in a repo, please please spend some time, come back to those code, write a clean, structured code, this is a good work and really a pity that part of experiments is very hard to reproduce the results, and is written in a very messy way, thanks

Can you provide an uncased version of SpanBERT as well?

Evaluation of finetuned model on SQuAD 2.0

Hi! I tried evaluating the finetuned model on the SQuAD 2.0 dev set but was unable to reproduce the scores that were reported in the README. Am I missing something major?

Command I ran:

python code/run_squad.py --do_eval --output_dir ../output/spanbert_squad2/ --dev_file ../squad_dev_2.0.json --model spanbert-large-cased-squad-2

--output_dir is where I downloaded the model, and I also added spanbert-large-cased-squad-2 to modeling.py pointing to http://dl.fbaipublicfiles.com/fairseq/models/spanbert_squad2.tar.gz and tokenization.py pointing to https://s3.amazonaws.com/models.huggingface.co/bert/bert-large-cased-vocab.txt since it appears to be the large model according to config.json.

This was the result of the evaluation:

11/05/2019 16:17:35 - INFO - __main__ - ***** Eval results *****
11/05/2019 16:17:35 - INFO - __main__ -   exact = 42.96302535163817
11/05/2019 16:17:35 - INFO - __main__ -   f1 = 46.48860026174417
11/05/2019 16:17:35 - INFO - __main__ -   total = 11873

I ran it through the run_squad.py from huggingface/transformers and got these results:

{
  "exact": 42.96302535163817,
  "f1": 46.48523127328296,
  "total": 11873,
  "HasAns_exact": 86.04925775978407,
  "HasAns_f1": 93.10377039603385,
  "HasAns_total": 5928,
  "NoAns_exact": 0.0,
  "NoAns_f1": 0.0,
  "NoAns_total": 5945
}

Is it possible the link for the SQuAD 2.0 finetuned model might be pointing to the SQuAD 1.1 weights instead?

vocab size.

the vocab_size in config.json is 28996 , which is different from the vocab.txt , why?

How do I load the pretrained model?

I downloaded the spanbert_hf_base.tar file and when I load it torch.load(spanbert_hf_base.tar), it gave me the error KeyError: "filename 'storages' not found". How do I properly load the pretrained model? Thanks

QA: Available models

Hi,

Thank you for releasing the code. I want to use SpanBERT QA model for inference in my project. But I dont have resources to finetune it. In README, you have put

If you are interested in using our fine-tuned models for downstream tasks (QA, coreference, relation extraction) directly, we also provide the following models:

Then there is no links for downloading. Please let me know if I am missing something. Can you please provide me a link to download pretrained Squad QA model?

Thanks,
Ram

Licence - Why not Apache 2.0 like Google?

Hey facebook-research team,

Why are you not releasing this work under the Apache 2.0 licence like google? That would be much more of a contribution to the public.

Thanks.

Using SpanBERT to predict spans

Hi, I maintain FitBERT, a library that uses BERT to fill in the blanks (do masked language modeling). An example:

from fitbert import FitBert


# in theory you can pass a model_name and tokenizer
# currently supported models: bert-large-uncased and distilbert-base-uncased
# this takes a while and loads a whole big BERT into memory
fb = FitBert()

masked_string = "Why Bert, you're looking ***mask*** today!"
options = ['buff', 'handsome', 'strong']

ranked_options = fb.rank(masked_string, options=options)
# >>> ['handsome', 'strong', 'buff']

I think that SpanBERT should be an improvement to using BERT when the mask covers more than 1 token. However, I am not seeing SBO anywhere in this code. Is it not released? Should I just mask sequential tokens and hope that the model is better at predicting them?

Also, since you don't do NSP, does the [SEP] tag have any meaning? If not, is there no special EOS token?

Thanks so much!

Question for training details about SpanBERT base model

The base model has a very powerful performance !
What are the training steps / batch size / learning rate for the base model ? Is that all same with the large model ?
Do you have any other corpus for training the base model except wiki or bookcorpus ?

training from scratch

Thanks for this amazing code base!

I am a newbie to understand this code-base especially the "pretrain from scratch".

What kind of public dataset I can use ? am I suppose to use wiki and book corpus as Bert?
I found this from google's repo:
https://github.com/google-research/bert#pre-training-data

**Pre-training data
We will not be able to release the pre-processed datasets used in the paper. For Wikipedia, the recommended pre-processing is to download the latest dump, extract the text with WikiExtractor.py, and then apply any necessary cleanup to convert it into plain text.

Unfortunately the researchers who collected the BookCorpus no longer have it available for public download. The Project Guttenberg Dataset is a somewhat smaller (200M word) collection of older books that are public domain.**

would it be sufficient(just mock the pretrain) to run wiki-dump + Guttenberg dataset?
Would it be ok to just run your preprocess command with this dataset? may you give me more explicit direction?

1-2.
Did you also use Wiki-dump + Guttenberg ?

When I run this command:
python train.py /path/to/preprocessed_data --total-num-update 2400000 --max-update 2400000 --save-interval 1 --arch cased_bert_pair_large --task span_bert --optimizer adam --lr-scheduler polynomial_decay --lr 0.0001 --min-lr 1e-09 --criterion span_bert_loss --max-tokens 4096 --tokens-per-sample 512 --weight-decay 0.01 --skip-invalid-size-inputs-valid-test --log-format json --log-interval 2000 --save-interval-updates 50000 --keep-interval-updates 50000 --update-freq 1 --seed 1 --save-dir /path/to/checkpoint_dir --fp16 --warmup-updates 10000 --schemes ["pair_span"] --distributed-port 12580 --distributed-world-size 32 --span-lower 1 --span-upper 10 --validate-interval 1 --clip-norm 1.0 --geometric-p 0.2 --adam-eps 1e-8 --short-seq-prob 0.0 --replacement-method span --clamp-attention --no-nsp --pair-loss-weight 1.0 --max-pair-targets 15 --pair-positional-embedding-size 200 --endpoints external

I got this error:
File "train.py", line 381, in
distributed_main(args)
File "/home/user/spanBertReform/pretraining/distributed_train.py", line 37, in main
args.distributed_rank = distributed_utils.distributed_init(args)
File "/home/user/spanBertReform/pretraining/fairseq/distributed_utils.py", line 65, in distributed_init
rank=args.distributed_rank,
File "/home/user/anaconda3/envs/spanBertReform/lib/python3.7/site-packages/torch/distributed/distributed_c10d.py", line 406, in init_process_group
store, rank, world_size = next(rendezvous(url))
File "/home/user/anaconda3/envs/spanBertReform/lib/python3.7/site-packages/torch/distributed/rendezvous.py", line 130, in _env_rendezvous_handler
raise _env_error("MASTER_ADDR")
ValueError: Error initializing torch.distributed using env:// rendezvous: environment variable MASTER_ADDR expected, but not set

How can I Resolve this ?

3
Any recommendation to run this script without using P3?(my current machine only has 2X RTX 2080ti only :( )

facebookresearch / spanbert Goto Github PK

spanbert's Introduction

SpanBERT

Requirements

Apex

Pre-trained Models

Fine-tuning

SQuAD 1.1

SQuAD 2.0

TACRED

MRQA (NewsQA, TriviaQA, SearchQA, HotpotQA, NaturalQuestions)

GLUE

Coreference Resolution

Finetuned Models (SQuAD 1.1/2.0, Relation Extraction, Coreference Resolution)

Citation

License

Contact

spanbert's People

Contributors

Stargazers

Watchers

Forkers

spanbert's Issues

NVIDIA Apex

Fine-tuning

Recommend Projects

Recommend Topics

Recommend Org