I've built the compressed DensePhrase index on SQuAD using OPQ96. I haven't run any qu

Reproduction of DensePhrase (w/ PQ, w/o qft) on SQuAD about densephrases HOT 9 CLOSED

princeton-nlp commented on May 20, 2024

Reproduction of DensePhrase (w/ PQ, w/o qft) on SQuAD

from densephrases.

Comments (9)

alexlimh commented on May 20, 2024 1

Got it. will try the semi-domain evaluation and see if it works.

from densephrases.

jhyuklee commented on May 20, 2024

Hi, is this using the entire Wikipedia for phrase dump? or just squad development set passages?
If it's just using the squad development set passages, this is pretty low. I think I got over 60 EM for SQuAD. Even densephrases-multi will have at least 50 EM.

from densephrases.

alexlimh commented on May 20, 2024

Hi, is this using the entire Wikipedia for phrase dump? or just squad development set passages?

This is using the entire Wikipedia dump and tested on sqd-open-qa.

from densephrases.

jhyuklee commented on May 20, 2024

If this is using the entire Wikipedia dump, then I think this is a good start. You'll able to reach 35~40 EM after query-side fine-tuning. Make sure that you set larger max_answer_length for SQuAD because they do have larger length answers.

from densephrases.

jhyuklee commented on May 20, 2024

I'm not sure which model you used for generating the phrase vecs, but it seems to be slightly low compared to what is uploaded in Github (densephrases-multi scores 29EM before QSFT on SQuAD).

from densephrases.

alexlimh commented on May 20, 2024

Thanks! That's really helpful.

Make sure that you set larger max_answer_length for SQuAD because they do have larger length answers.

Is there a default max_answer_length for SQuAD that you've been using?

I'm not sure which model you used for generating the phrase vecs

The model I used is densephrases-squad-ddp which is trained using the following command:

make run-rc-sqd-ddp MODEL_NAME=densephrases-squad-ddp

run-rc-sqd-ddp: model-name sqd-rc-data sqd-param pbn-param medium1-index
make train-rc-ddp \
  TRAIN_DATA=$(TRAIN_QG_DATA) DEV_DATA=$(DEV_DATA) \
  TEACHER_NAME=$(TEACHER_NAME) MODEL_NAME=$(MODEL_NAME)_tmp \
  BS=$(BS) LR=$(LR) MAX_SEQ_LEN=$(MAX_SEQ_LEN) \
  LAMBDA_KL=$(LAMBDA_KL) LAMBDA_NEG=$(LAMBDA_NEG)
make train-rc-ddp \
  TRAIN_DATA=$(TRAIN_DATA) DEV_DATA=$(DEV_DATA) \
  TEACHER_NAME=$(TEACHER_NAME) MODEL_NAME=$(MODEL_NAME) \
  BS=$(BS) LR=$(LR) MAX_SEQ_LEN=$(MAX_SEQ_LEN) \
  LAMBDA_KL=$(LAMBDA_KL) LAMBDA_NEG=$(LAMBDA_NEG) \
  OPTIONS='$(PBN_OPTIONS) --load_dir $(SAVE_DIR)/$(MODEL_NAME)_tmp'

train-rc-ddp: model-name sqd-rc-data sqd-param
	OMP_NUM_THREADS=20 python -m torch.distributed.launch \
		--nnode=1 --node_rank=0 --nproc_per_node=4 train_rc.py \
		--model_type bert \
		--pretrained_name_or_path SpanBERT/spanbert-base-cased \
		--data_dir $(DATA_DIR)/single-qa \
		--cache_dir $(CACHE_DIR) \
		--train_file $(TRAIN_DATA) \
		--predict_file $(DEV_DATA) \
		--do_train \
		--do_eval \
		--fp16 \
		--per_gpu_train_batch_size $(BS) \
		--learning_rate $(LR) \
		--num_train_epochs 2.0 \
		--max_seq_length $(MAX_SEQ_LEN) \
		--lambda_kl $(LAMBDA_KL) \
		--lambda_neg $(LAMBDA_NEG) \
		--lambda_flt 1.0 \
		--filter_threshold -2.0 \
		--append_title \
		--evaluate_during_training \
		--teacher_dir $(SAVE_DIR)/$(TEACHER_NAME) \
		--output_dir $(SAVE_DIR)/$(MODEL_NAME) \
		$(OPTIONS)

Generating Vecs (in parallel):

make gen-vecs-parallel MODEL_NAME=densephrases-squad-ddp START=$start END=$end

gen-vecs-parallel: model-name
python scripts/parallel/dump_phrases.py \
		--model_type bert \
		--pretrained_name_or_path SpanBERT/spanbert-base-cased \
		--cache_dir $(CACHE_DIR) \
		--data_dir $(DATA_DIR)/wikidump \
		--data_name wiki-20181220 \
		--load_dir $(SAVE_DIR)/$(MODEL_NAME) \
		--output_dir $(SAVE_DIR)/$(MODEL_NAME) \
		--filter_threshold 1.0 \
		--append_title \
		--start $(START) \
		--end $(END) \
		--num_gpus 4

Building Index:

make index-vecs DUMP_DIR=$SAVE_DIR/densephrases-squad-ddp_wiki-20181220/dump/ NUM_CLUSTERS=1048576 INDEX_TYPE=OPQ96

index-vecs: dump-dir large-index
python build_phrase_index.py \
		--dump_dir $(DUMP_DIR) \
		--stage all \
		--replace \
		--num_clusters $(NUM_CLUSTERS) \
		--fine_quant $(INDEX_TYPE) \
		--cuda

Compressing meta:

make compress-meta DUMP_DIR=$SAVE_DIR/densephrases-squad-ddp_wiki-20181220/dump

compress-meta:
	python scripts/preprocess/compress_metadata.py \
		--input_dump_dir $(DUMP_DIR)/phrase \
		--output_dir $(DUMP_DIR)

Evaluating index:

make eval-index-psg-sqd MODEL_NAME=densephrases-squad-ddp DUMP_DIR=outputs/densephrases-squad-ddp_wiki-20181220/dump/

eval-index-psg-sqd: dump-dir model-name large-index sqd-open-data
	python eval_phrase_retrieval.py \
		--run_mode eval \
		--model_type bert \
		--pretrained_name_or_path SpanBERT/spanbert-base-cased \
		--cuda \
		--dump_dir $(DUMP_DIR) \
		--index_name start/$(NUM_CLUSTERS)_flat_$(INDEX_TYPE) \
		--load_dir $(SAVE_DIR)/$(MODEL_NAME) \
		--test_path $(DATA_DIR)/$(TEST_DATA) \
		--save_pred \
		--aggregate \
		--agg_strat opt2 \
		--top_k 200 \
		--eval_psg \
		--psg_top_k 100 \
		$(OPTIONS)

from densephrases.

jhyuklee commented on May 20, 2024

Looks like you did a great job for the entire process (except that I don't know why you chose medium-index for run-rc-sqd-ddp not small-index. SQuAD dev set passages are only about 2k, so small-index should be fine.)

First, I have to mention that the current hyperparameters (sqd-param) are not very ddp friendly, but only for a single 24GB GPU. If you want to use ddp for training, I strongly suggest you to change the hyper parameters (larger batch sizes might require larger learning rates) and keep track of the accuracy (i.e., semi-OD accuracy on SQuAD passages) right after the run-rc-sqd-ddp, which strongly correlate with the final open-domain QA accuracy.

Second, for SQuAD, maximum sequence length matters more than the batch size in my experience. Also for max_answer_length, it is now set to 10, which is for NQ (since it has at most 5 words as an answer), but you can set it to 20 for SQuAD.

from densephrases.

alexlimh commented on May 20, 2024

I see. But I look at the training log it seems densephrases is working fine on the dev set:

OMP_NUM_THREADS=20 python -m torch.distributed.launch \
	--nnode=1 --node_rank=0 --nproc_per_node=4 train_rc.py \
	--model_type bert \
	--pretrained_name_or_path SpanBERT/spanbert-base-cased \
	--data_dir .//densephrases-data/single-qa \
	--cache_dir .//cache \
	--train_file squad/train-v1.1_qg_ents_t5large_3500_filtered.json \
	--predict_file squad/dev-v1.1.json \
	--do_train \
	--do_eval \
	--fp16 \
	--per_gpu_train_batch_size 24 \
	--learning_rate 3e-5 \
	--num_train_epochs 2.0 \
	--max_seq_length 384 \
	--lambda_kl 4.0 \
	--lambda_neg 2.0 \
	--lambda_flt 1.0 \
	--filter_threshold -2.0 \
	--append_title \
	--evaluate_during_training \
	--teacher_dir .//outputs/spanbert-base-cased-squad \
	--output_dir .//outputs/densephrases-squad-ddp_tmp \

...

Evaluating: 100%|█████████▉| 904/905 [00:53<00:00, 16.45it/s]
10570it [00:53, 196.62it/s]
11/17/2021 00:23:52 - INFO - densephrases.utils.squad_metrics -   saved vecs=1104943/15389478, save rate=0.0718
11/17/2021 00:23:52 - INFO - densephrases.utils.squad_metrics -   answer recall=0.0000

Evaluating: 100%|█████████▉| 904/905 [00:55<00:00, 16.34it/s]
11/17/2021 00:23:55 - INFO - __main__ -   Evaluation done in total 56.362774 secs (0.005194 sec per example)
11/17/2021 00:23:55 - INFO - __main__ -   Results: {'exact_final': 75.49668874172185, 'f1_final': 84.58729121922526, 'total_final': 10570, 'HasAns_exact_final': 75.49668874172185, 'HasAns_f1_final': 84.58729121922526, 'HasAns_total_final': 10570, 'best_exact_final': 75.49668874172185, 'best_exact_thresh_final': 0.0, 'best_f1_final': 84.58729121922526, 'best_f1_thresh_final': 0.0}

from densephrases.

jhyuklee commented on May 20, 2024

Oh, I think you are missing this part where you can evaluate your model based on the semi-open domain setup (using all development set passages). This is a better approximation of open-domain QA.

DensePhrases/Makefile

Lines 219 to 230 in b52fe06

 make gen-vecs \ 

 DEV_DATA=$(DEV_DATA) MODEL_NAME=$(MODEL_NAME) 

 make index-vecs \ 

 DUMP_DIR=$(SAVE_DIR)/$(MODEL_NAME)/dump \ 

 NUM_CLUSTERS=$(NUM_CLUSTERS) INDEX_TYPE=$(INDEX_TYPE) 

 make compress-meta \ 

 DUMP_DIR=$(SAVE_DIR)/$(MODEL_NAME)/dump 

 make eval-index \ 

 DUMP_DIR=$(SAVE_DIR)/$(MODEL_NAME)/dump \ 

 NUM_CLUSTERS=$(NUM_CLUSTERS) INDEX_TYPE=$(INDEX_TYPE) \ 

 MODEL_LANE=$(MODEL_NAME) TEST_DATA=$(SOD_DATA) \ 

 OPTIONS=$(OPTIONS)

from densephrases.

Reproduction of DensePhrase (w/ PQ, w/o qft) on SQuAD about densephrases HOT 9 CLOSED

Comments (9)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

	make gen-vecs \
	DEV_DATA=$(DEV_DATA) MODEL_NAME=$(MODEL_NAME)
	make index-vecs \
	DUMP_DIR=$(SAVE_DIR)/$(MODEL_NAME)/dump \
	NUM_CLUSTERS=$(NUM_CLUSTERS) INDEX_TYPE=$(INDEX_TYPE)
	make compress-meta \
	DUMP_DIR=$(SAVE_DIR)/$(MODEL_NAME)/dump
	make eval-index \
	DUMP_DIR=$(SAVE_DIR)/$(MODEL_NAME)/dump \
	NUM_CLUSTERS=$(NUM_CLUSTERS) INDEX_TYPE=$(INDEX_TYPE) \
	MODEL_LANE=$(MODEL_NAME) TEST_DATA=$(SOD_DATA) \
	OPTIONS=$(OPTIONS)