Giter VIP home page Giter VIP logo

Comments (9)

alexlimh avatar alexlimh commented on May 20, 2024 1

Got it. will try the semi-domain evaluation and see if it works.

from densephrases.

jhyuklee avatar jhyuklee commented on May 20, 2024

Hi, is this using the entire Wikipedia for phrase dump? or just squad development set passages?
If it's just using the squad development set passages, this is pretty low. I think I got over 60 EM for SQuAD. Even densephrases-multi will have at least 50 EM.

from densephrases.

alexlimh avatar alexlimh commented on May 20, 2024

Hi, is this using the entire Wikipedia for phrase dump? or just squad development set passages?

This is using the entire Wikipedia dump and tested on sqd-open-qa.

from densephrases.

jhyuklee avatar jhyuklee commented on May 20, 2024

If this is using the entire Wikipedia dump, then I think this is a good start. You'll able to reach 35~40 EM after query-side fine-tuning. Make sure that you set larger max_answer_length for SQuAD because they do have larger length answers.

from densephrases.

jhyuklee avatar jhyuklee commented on May 20, 2024

I'm not sure which model you used for generating the phrase vecs, but it seems to be slightly low compared to what is uploaded in Github (densephrases-multi scores 29EM before QSFT on SQuAD).

from densephrases.

alexlimh avatar alexlimh commented on May 20, 2024

Thanks! That's really helpful.

Make sure that you set larger max_answer_length for SQuAD because they do have larger length answers.

Is there a default max_answer_length for SQuAD that you've been using?

I'm not sure which model you used for generating the phrase vecs

The model I used is densephrases-squad-ddp which is trained using the following command:

make run-rc-sqd-ddp MODEL_NAME=densephrases-squad-ddp
run-rc-sqd-ddp: model-name sqd-rc-data sqd-param pbn-param medium1-index
make train-rc-ddp \
  TRAIN_DATA=$(TRAIN_QG_DATA) DEV_DATA=$(DEV_DATA) \
  TEACHER_NAME=$(TEACHER_NAME) MODEL_NAME=$(MODEL_NAME)_tmp \
  BS=$(BS) LR=$(LR) MAX_SEQ_LEN=$(MAX_SEQ_LEN) \
  LAMBDA_KL=$(LAMBDA_KL) LAMBDA_NEG=$(LAMBDA_NEG)
make train-rc-ddp \
  TRAIN_DATA=$(TRAIN_DATA) DEV_DATA=$(DEV_DATA) \
  TEACHER_NAME=$(TEACHER_NAME) MODEL_NAME=$(MODEL_NAME) \
  BS=$(BS) LR=$(LR) MAX_SEQ_LEN=$(MAX_SEQ_LEN) \
  LAMBDA_KL=$(LAMBDA_KL) LAMBDA_NEG=$(LAMBDA_NEG) \
  OPTIONS='$(PBN_OPTIONS) --load_dir $(SAVE_DIR)/$(MODEL_NAME)_tmp'
train-rc-ddp: model-name sqd-rc-data sqd-param
	OMP_NUM_THREADS=20 python -m torch.distributed.launch \
		--nnode=1 --node_rank=0 --nproc_per_node=4 train_rc.py \
		--model_type bert \
		--pretrained_name_or_path SpanBERT/spanbert-base-cased \
		--data_dir $(DATA_DIR)/single-qa \
		--cache_dir $(CACHE_DIR) \
		--train_file $(TRAIN_DATA) \
		--predict_file $(DEV_DATA) \
		--do_train \
		--do_eval \
		--fp16 \
		--per_gpu_train_batch_size $(BS) \
		--learning_rate $(LR) \
		--num_train_epochs 2.0 \
		--max_seq_length $(MAX_SEQ_LEN) \
		--lambda_kl $(LAMBDA_KL) \
		--lambda_neg $(LAMBDA_NEG) \
		--lambda_flt 1.0 \
		--filter_threshold -2.0 \
		--append_title \
		--evaluate_during_training \
		--teacher_dir $(SAVE_DIR)/$(TEACHER_NAME) \
		--output_dir $(SAVE_DIR)/$(MODEL_NAME) \
		$(OPTIONS)

Generating Vecs (in parallel):

make gen-vecs-parallel MODEL_NAME=densephrases-squad-ddp START=$start END=$end
gen-vecs-parallel: model-name
python scripts/parallel/dump_phrases.py \
		--model_type bert \
		--pretrained_name_or_path SpanBERT/spanbert-base-cased \
		--cache_dir $(CACHE_DIR) \
		--data_dir $(DATA_DIR)/wikidump \
		--data_name wiki-20181220 \
		--load_dir $(SAVE_DIR)/$(MODEL_NAME) \
		--output_dir $(SAVE_DIR)/$(MODEL_NAME) \
		--filter_threshold 1.0 \
		--append_title \
		--start $(START) \
		--end $(END) \
		--num_gpus 4

Building Index:

make index-vecs DUMP_DIR=$SAVE_DIR/densephrases-squad-ddp_wiki-20181220/dump/ NUM_CLUSTERS=1048576 INDEX_TYPE=OPQ96
index-vecs: dump-dir large-index
python build_phrase_index.py \
		--dump_dir $(DUMP_DIR) \
		--stage all \
		--replace \
		--num_clusters $(NUM_CLUSTERS) \
		--fine_quant $(INDEX_TYPE) \
		--cuda

Compressing meta:

make compress-meta DUMP_DIR=$SAVE_DIR/densephrases-squad-ddp_wiki-20181220/dump
compress-meta:
	python scripts/preprocess/compress_metadata.py \
		--input_dump_dir $(DUMP_DIR)/phrase \
		--output_dir $(DUMP_DIR)

Evaluating index:

make eval-index-psg-sqd MODEL_NAME=densephrases-squad-ddp DUMP_DIR=outputs/densephrases-squad-ddp_wiki-20181220/dump/
eval-index-psg-sqd: dump-dir model-name large-index sqd-open-data
	python eval_phrase_retrieval.py \
		--run_mode eval \
		--model_type bert \
		--pretrained_name_or_path SpanBERT/spanbert-base-cased \
		--cuda \
		--dump_dir $(DUMP_DIR) \
		--index_name start/$(NUM_CLUSTERS)_flat_$(INDEX_TYPE) \
		--load_dir $(SAVE_DIR)/$(MODEL_NAME) \
		--test_path $(DATA_DIR)/$(TEST_DATA) \
		--save_pred \
		--aggregate \
		--agg_strat opt2 \
		--top_k 200 \
		--eval_psg \
		--psg_top_k 100 \
		$(OPTIONS)

from densephrases.

jhyuklee avatar jhyuklee commented on May 20, 2024

Looks like you did a great job for the entire process (except that I don't know why you chose medium-index for run-rc-sqd-ddp not small-index. SQuAD dev set passages are only about 2k, so small-index should be fine.)

First, I have to mention that the current hyperparameters (sqd-param) are not very ddp friendly, but only for a single 24GB GPU. If you want to use ddp for training, I strongly suggest you to change the hyper parameters (larger batch sizes might require larger learning rates) and keep track of the accuracy (i.e., semi-OD accuracy on SQuAD passages) right after the run-rc-sqd-ddp, which strongly correlate with the final open-domain QA accuracy.

Second, for SQuAD, maximum sequence length matters more than the batch size in my experience. Also for max_answer_length, it is now set to 10, which is for NQ (since it has at most 5 words as an answer), but you can set it to 20 for SQuAD.

from densephrases.

alexlimh avatar alexlimh commented on May 20, 2024

I see. But I look at the training log it seems densephrases is working fine on the dev set:

OMP_NUM_THREADS=20 python -m torch.distributed.launch \
	--nnode=1 --node_rank=0 --nproc_per_node=4 train_rc.py \
	--model_type bert \
	--pretrained_name_or_path SpanBERT/spanbert-base-cased \
	--data_dir .//densephrases-data/single-qa \
	--cache_dir .//cache \
	--train_file squad/train-v1.1_qg_ents_t5large_3500_filtered.json \
	--predict_file squad/dev-v1.1.json \
	--do_train \
	--do_eval \
	--fp16 \
	--per_gpu_train_batch_size 24 \
	--learning_rate 3e-5 \
	--num_train_epochs 2.0 \
	--max_seq_length 384 \
	--lambda_kl 4.0 \
	--lambda_neg 2.0 \
	--lambda_flt 1.0 \
	--filter_threshold -2.0 \
	--append_title \
	--evaluate_during_training \
	--teacher_dir .//outputs/spanbert-base-cased-squad \
	--output_dir .//outputs/densephrases-squad-ddp_tmp \

...

Evaluating: 100%|█████████▉| 904/905 [00:53<00:00, 16.45it/s]
10570it [00:53, 196.62it/s]
11/17/2021 00:23:52 - INFO - densephrases.utils.squad_metrics -   saved vecs=1104943/15389478, save rate=0.0718
11/17/2021 00:23:52 - INFO - densephrases.utils.squad_metrics -   answer recall=0.0000

Evaluating: 100%|█████████▉| 904/905 [00:55<00:00, 16.34it/s]
11/17/2021 00:23:55 - INFO - __main__ -   Evaluation done in total 56.362774 secs (0.005194 sec per example)
11/17/2021 00:23:55 - INFO - __main__ -   Results: {'exact_final': 75.49668874172185, 'f1_final': 84.58729121922526, 'total_final': 10570, 'HasAns_exact_final': 75.49668874172185, 'HasAns_f1_final': 84.58729121922526, 'HasAns_total_final': 10570, 'best_exact_final': 75.49668874172185, 'best_exact_thresh_final': 0.0, 'best_f1_final': 84.58729121922526, 'best_f1_thresh_final': 0.0}

from densephrases.

jhyuklee avatar jhyuklee commented on May 20, 2024

Oh, I think you are missing this part where you can evaluate your model based on the semi-open domain setup (using all development set passages). This is a better approximation of open-domain QA.

DensePhrases/Makefile

Lines 219 to 230 in b52fe06

make gen-vecs \
DEV_DATA=$(DEV_DATA) MODEL_NAME=$(MODEL_NAME)
make index-vecs \
DUMP_DIR=$(SAVE_DIR)/$(MODEL_NAME)/dump \
NUM_CLUSTERS=$(NUM_CLUSTERS) INDEX_TYPE=$(INDEX_TYPE)
make compress-meta \
DUMP_DIR=$(SAVE_DIR)/$(MODEL_NAME)/dump
make eval-index \
DUMP_DIR=$(SAVE_DIR)/$(MODEL_NAME)/dump \
NUM_CLUSTERS=$(NUM_CLUSTERS) INDEX_TYPE=$(INDEX_TYPE) \
MODEL_LANE=$(MODEL_NAME) TEST_DATA=$(SOD_DATA) \
OPTIONS=$(OPTIONS)

from densephrases.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.