Comments (9)
Got it. will try the semi-domain evaluation and see if it works.
from densephrases.
Hi, is this using the entire Wikipedia for phrase dump? or just squad development set passages?
If it's just using the squad development set passages, this is pretty low. I think I got over 60 EM for SQuAD. Even densephrases-multi will have at least 50 EM.
from densephrases.
Hi, is this using the entire Wikipedia for phrase dump? or just squad development set passages?
This is using the entire Wikipedia dump and tested on sqd-open-qa.
from densephrases.
If this is using the entire Wikipedia dump, then I think this is a good start. You'll able to reach 35~40 EM after query-side fine-tuning. Make sure that you set larger max_answer_length for SQuAD because they do have larger length answers.
from densephrases.
I'm not sure which model you used for generating the phrase vecs, but it seems to be slightly low compared to what is uploaded in Github (densephrases-multi scores 29EM before QSFT on SQuAD).
from densephrases.
Thanks! That's really helpful.
Make sure that you set larger max_answer_length for SQuAD because they do have larger length answers.
Is there a default max_answer_length for SQuAD that you've been using?
I'm not sure which model you used for generating the phrase vecs
The model I used is densephrases-squad-ddp
which is trained using the following command:
make run-rc-sqd-ddp MODEL_NAME=densephrases-squad-ddp
run-rc-sqd-ddp: model-name sqd-rc-data sqd-param pbn-param medium1-index
make train-rc-ddp \
TRAIN_DATA=$(TRAIN_QG_DATA) DEV_DATA=$(DEV_DATA) \
TEACHER_NAME=$(TEACHER_NAME) MODEL_NAME=$(MODEL_NAME)_tmp \
BS=$(BS) LR=$(LR) MAX_SEQ_LEN=$(MAX_SEQ_LEN) \
LAMBDA_KL=$(LAMBDA_KL) LAMBDA_NEG=$(LAMBDA_NEG)
make train-rc-ddp \
TRAIN_DATA=$(TRAIN_DATA) DEV_DATA=$(DEV_DATA) \
TEACHER_NAME=$(TEACHER_NAME) MODEL_NAME=$(MODEL_NAME) \
BS=$(BS) LR=$(LR) MAX_SEQ_LEN=$(MAX_SEQ_LEN) \
LAMBDA_KL=$(LAMBDA_KL) LAMBDA_NEG=$(LAMBDA_NEG) \
OPTIONS='$(PBN_OPTIONS) --load_dir $(SAVE_DIR)/$(MODEL_NAME)_tmp'
train-rc-ddp: model-name sqd-rc-data sqd-param
OMP_NUM_THREADS=20 python -m torch.distributed.launch \
--nnode=1 --node_rank=0 --nproc_per_node=4 train_rc.py \
--model_type bert \
--pretrained_name_or_path SpanBERT/spanbert-base-cased \
--data_dir $(DATA_DIR)/single-qa \
--cache_dir $(CACHE_DIR) \
--train_file $(TRAIN_DATA) \
--predict_file $(DEV_DATA) \
--do_train \
--do_eval \
--fp16 \
--per_gpu_train_batch_size $(BS) \
--learning_rate $(LR) \
--num_train_epochs 2.0 \
--max_seq_length $(MAX_SEQ_LEN) \
--lambda_kl $(LAMBDA_KL) \
--lambda_neg $(LAMBDA_NEG) \
--lambda_flt 1.0 \
--filter_threshold -2.0 \
--append_title \
--evaluate_during_training \
--teacher_dir $(SAVE_DIR)/$(TEACHER_NAME) \
--output_dir $(SAVE_DIR)/$(MODEL_NAME) \
$(OPTIONS)
Generating Vecs (in parallel):
make gen-vecs-parallel MODEL_NAME=densephrases-squad-ddp START=$start END=$end
gen-vecs-parallel: model-name
python scripts/parallel/dump_phrases.py \
--model_type bert \
--pretrained_name_or_path SpanBERT/spanbert-base-cased \
--cache_dir $(CACHE_DIR) \
--data_dir $(DATA_DIR)/wikidump \
--data_name wiki-20181220 \
--load_dir $(SAVE_DIR)/$(MODEL_NAME) \
--output_dir $(SAVE_DIR)/$(MODEL_NAME) \
--filter_threshold 1.0 \
--append_title \
--start $(START) \
--end $(END) \
--num_gpus 4
Building Index:
make index-vecs DUMP_DIR=$SAVE_DIR/densephrases-squad-ddp_wiki-20181220/dump/ NUM_CLUSTERS=1048576 INDEX_TYPE=OPQ96
index-vecs: dump-dir large-index
python build_phrase_index.py \
--dump_dir $(DUMP_DIR) \
--stage all \
--replace \
--num_clusters $(NUM_CLUSTERS) \
--fine_quant $(INDEX_TYPE) \
--cuda
Compressing meta:
make compress-meta DUMP_DIR=$SAVE_DIR/densephrases-squad-ddp_wiki-20181220/dump
compress-meta:
python scripts/preprocess/compress_metadata.py \
--input_dump_dir $(DUMP_DIR)/phrase \
--output_dir $(DUMP_DIR)
Evaluating index:
make eval-index-psg-sqd MODEL_NAME=densephrases-squad-ddp DUMP_DIR=outputs/densephrases-squad-ddp_wiki-20181220/dump/
eval-index-psg-sqd: dump-dir model-name large-index sqd-open-data
python eval_phrase_retrieval.py \
--run_mode eval \
--model_type bert \
--pretrained_name_or_path SpanBERT/spanbert-base-cased \
--cuda \
--dump_dir $(DUMP_DIR) \
--index_name start/$(NUM_CLUSTERS)_flat_$(INDEX_TYPE) \
--load_dir $(SAVE_DIR)/$(MODEL_NAME) \
--test_path $(DATA_DIR)/$(TEST_DATA) \
--save_pred \
--aggregate \
--agg_strat opt2 \
--top_k 200 \
--eval_psg \
--psg_top_k 100 \
$(OPTIONS)
from densephrases.
Looks like you did a great job for the entire process (except that I don't know why you chose medium-index
for run-rc-sqd-ddp
not small-index
. SQuAD dev set passages are only about 2k, so small-index
should be fine.)
First, I have to mention that the current hyperparameters (sqd-param
) are not very ddp friendly, but only for a single 24GB GPU. If you want to use ddp for training, I strongly suggest you to change the hyper parameters (larger batch sizes might require larger learning rates) and keep track of the accuracy (i.e., semi-OD accuracy on SQuAD passages) right after the run-rc-sqd-ddp
, which strongly correlate with the final open-domain QA accuracy.
Second, for SQuAD, maximum sequence length matters more than the batch size in my experience. Also for max_answer_length, it is now set to 10, which is for NQ (since it has at most 5 words as an answer), but you can set it to 20 for SQuAD.
from densephrases.
I see. But I look at the training log it seems densephrases is working fine on the dev set:
OMP_NUM_THREADS=20 python -m torch.distributed.launch \
--nnode=1 --node_rank=0 --nproc_per_node=4 train_rc.py \
--model_type bert \
--pretrained_name_or_path SpanBERT/spanbert-base-cased \
--data_dir .//densephrases-data/single-qa \
--cache_dir .//cache \
--train_file squad/train-v1.1_qg_ents_t5large_3500_filtered.json \
--predict_file squad/dev-v1.1.json \
--do_train \
--do_eval \
--fp16 \
--per_gpu_train_batch_size 24 \
--learning_rate 3e-5 \
--num_train_epochs 2.0 \
--max_seq_length 384 \
--lambda_kl 4.0 \
--lambda_neg 2.0 \
--lambda_flt 1.0 \
--filter_threshold -2.0 \
--append_title \
--evaluate_during_training \
--teacher_dir .//outputs/spanbert-base-cased-squad \
--output_dir .//outputs/densephrases-squad-ddp_tmp \
...
Evaluating: 100%|█████████▉| 904/905 [00:53<00:00, 16.45it/s]
10570it [00:53, 196.62it/s]
11/17/2021 00:23:52 - INFO - densephrases.utils.squad_metrics - saved vecs=1104943/15389478, save rate=0.0718
11/17/2021 00:23:52 - INFO - densephrases.utils.squad_metrics - answer recall=0.0000
Evaluating: 100%|█████████▉| 904/905 [00:55<00:00, 16.34it/s]
11/17/2021 00:23:55 - INFO - __main__ - Evaluation done in total 56.362774 secs (0.005194 sec per example)
11/17/2021 00:23:55 - INFO - __main__ - Results: {'exact_final': 75.49668874172185, 'f1_final': 84.58729121922526, 'total_final': 10570, 'HasAns_exact_final': 75.49668874172185, 'HasAns_f1_final': 84.58729121922526, 'HasAns_total_final': 10570, 'best_exact_final': 75.49668874172185, 'best_exact_thresh_final': 0.0, 'best_f1_final': 84.58729121922526, 'best_f1_thresh_final': 0.0}
from densephrases.
Oh, I think you are missing this part where you can evaluate your model based on the semi-open domain setup (using all development set passages). This is a better approximation of open-domain QA.
Lines 219 to 230 in b52fe06
from densephrases.
Related Issues (20)
- Issue while creating faiss index, Command is not clear HOT 14
- How to extract phrases from Wikipedia? HOT 5
- Representations of phrases HOT 6
- Train custom teacher model HOT 3
- Question about faiss parameter HOT 4
- Modifying num_clusters in index-vecs HOT 11
- Unable to Reproduce Passage Retrieval Results on NQ HOT 9
- Significance of line 174 in train_query.py code HOT 4
- Iterative retrieval in case of non-unique top-k retrieval HOT 2
- failed with "make draft MODEL_NAME=test" HOT 2
- Where is the code for queries to get phrases searching score rank? HOT 2
- how to evaluate model on SQuAD (non openQA settings) HOT 1
- How to choose phrase to encode in wikipedia document
- DensePhrases for non-answerable questions
- run_demo.py : IndexError: index out of range in self HOT 1
- editing the demo file HOT 3
- IndexError: index 99 is out of bounds for axis 0 with size 35
- Recipe to build dense representations from corpus HOT 1
- Implementation of contrastive loss with in-passage negative
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from densephrases.