neulab / guided_summarization Goto Github PK

View Code? Open in Web Editor NEW

112.0 112.0 26.0 648 KB

GSum: A General Framework for Guided Neural Abstractive Summarization

License: MIT License

Python 97.28% C++ 0.78% Cuda 1.80% Shell 0.15%

guided_summarization's People

Contributors

Stargazers

Watchers

guided_summarization's Issues

Guidance ROUGE

Hi @zdou0830!

I just measured ROUGE scores for the guidance sentences you provided, and got the following result:

R1/R2/RL: 43.89/20.63/39.79

This a bit different from the MatchSum result reported in the paper:

R1/R2/RL: 44.41/20.86/40.55

I used files2rouge for ROUGE evaluation as suggested in Bart for summarization, with the default setting of
-c 95 -r 1000 -n 2 -a.

Could you please check this result on your end? Maybe I missed something.

Thank you!

About parameters in z_test.sh

Hello, thank you for sharing your code.
I'm trying to run your bart code recently, but I have a problem while running the z.test.sh

`------------------------------------------------------------------------------------------------------------------------------

Traceback (most recent call last):
File "/home/delab30/code/WangXiaoye/guided_summarization/bart/z_test.py", line 25, in
hypotheses_batch = bart.sample(slines, zlines, beam=4, lenpen=2.0, max_len_b=140, min_len=55, no_repeat_ngram_size=3, guided=True)
File "/home/delab30/code/WangXiaoye/guided_summarization/bart/fairseq/models/bart/guided_hub_interface.py", line 125, in sample
hypos = self.generate(input, z, beam, verbose, **kwargs)
File "/home/delab30/code/WangXiaoye/guided_summarization/bart/fairseq/models/bart/guided_hub_interface.py", line 142, in generate
prefix_tokens=sample['net_input']['src_tokens'].new_zeros((len(tokens), 1)).fill_(self.task.source_dictionary.bos()),
File "/home/delab30/code/WangXiaoye/guided_summarization/bart/fairseq/tasks/fairseq_task.py", line 354, in inference_step
return generator.generate(models, sample, prefix_tokens=prefix_tokens)
File "/home/delab30/anaconda3/envs/guided_summary/lib/python3.6/site-packages/torch/autograd/grad_mode.py", line 26, in decorate_context
return func(*args, **kwargs)
File "/home/delab30/code/WangXiaoye/guided_summarization/bart/fairseq/sequence_generator.py", line 852, in generate
return self._generate(model, sample, **kwargs)
File "/home/delab30/anaconda3/envs/guided_summary/lib/python3.6/site-packages/torch/autograd/grad_mode.py", line 26, in decorate_context
return func(*args, **kwargs)
File "/home/delab30/code/WangXiaoye/guided_summarization/bart/fairseq/sequence_generator.py", line 1042, in _generate
tokens[:, :step + 1], encoder_outs, z_encoder_outs, temperature=self.temperature,
File "/home/delab30/anaconda3/envs/guided_summary/lib/python3.6/site-packages/torch/autograd/grad_mode.py", line 26, in decorate_context
return func(*args, **kwargs)
File "/home/delab30/code/WangXiaoye/guided_summarization/bart/fairseq/sequence_generator.py", line 583, in z_forward_decoder
temperature=temperature,
File "/home/delab30/code/WangXiaoye/guided_summarization/bart/fairseq/sequence_generator.py", line 639, in z_decode_one
tokens, encoder_out=encoder_out, z_encoder_out=z_encoder_out, incremental_state=self.incremental_states[model],
File "/home/delab30/code/WangXiaoye/guided_summarization/bart/fairseq/models/bart/guided_model.py", line 110, in z_forward_decoder
decoder_out = self.decoder(prev_output_tokens, encoder_out, z_encoder_out=z_encoder_out, incremental_state=incremental_state, **extra_args)
File "/home/delab30/anaconda3/envs/guided_summary/lib/python3.6/site-packages/torch/nn/modules/module.py", line 744, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/delab30/code/WangXiaoye/guided_summarization/bart/fairseq/models/guided_transformer.py", line 681, in forward
alignment_heads=alignment_heads,
File "/home/delab30/code/WangXiaoye/guided_summarization/bart/fairseq/models/guided_transformer.py", line 803, in extract_features
need_head_weights=bool((idx == alignment_layer)),
File "/home/delab30/anaconda3/envs/guided_summary/lib/python3.6/site-packages/torch/nn/modules/module.py", line 744, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/delab30/code/WangXiaoye/guided_summarization/bart/fairseq/modules/guided_transformer_layer.py", line 197, in forward
need_head_weights=need_head_weights,
File "/home/delab30/anaconda3/envs/guided_summary/lib/python3.6/site-packages/torch/nn/modules/module.py", line 744, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/delab30/code/WangXiaoye/guided_summarization/bart/fairseq/modules/multihead_attention.py", line 287, in forward
assert key_padding_mask.size(0) == bsz
AssertionError
python-BaseException`

After checking the code and I found that the the shape of buffer('active_bbsz_idx') is 0(line 1218 in guided_summarization/bart/fairseq/sequence_generator.py), which may cause this problem. But I have no idea how to fix the code.
Do you have any suggestions for me?

I would appreciate it if you could help me!

Train bart.large on custom dataset from the begining

Hi, thank you for releasing trained model.
But if I want to train bart.large on my custom dataset from the beginning, and set model_path to fairseq bart.large, raise the exception like below. And it seems that the exception is caused by "architectures dismatch".
And I want to know where I should make a change to initialize these parameters which I did not find in z_train.sh you provided.

RuntimeError: Error(s) in loading state_dict for GuidedBARTModel:
Missing key(s) in state_dict: "encoder.layers.12.self_attn.k_proj.weight", "encoder.layers.12.self_attn.k_proj.bias", "encoder.layers.12.self_attn.v_proj.weight", "encoder.layers.12.self_attn.v_proj.bias", "encoder.layers.12.self_attn.q_proj.weight", "encoder.layers.12.self_attn.q_proj.bias", "encoder.layers.12.self_attn.out_proj.weight", "encoder.layers.12.self_attn.out_proj.bias", "encoder.layers.12.self_attn_layer_norm.weight", "encoder.layers.12.self_attn_layer_norm.bias", "encoder.layers.12.fc1.weight", "encoder.layers.12.fc1.bias", "encoder.layers.12.fc2.weight", "encoder.layers.12.fc2.bias", "encoder.layers.12.final_layer_norm.weight", "encoder.layers.12.final_layer_norm.bias", "decoder.layers.0.z_encoder_attn.k_proj.weight", "decoder.layers.0.z_encoder_attn.k_proj.bias", "decoder.layers.0.z_encoder_attn.v_proj.weight", "decoder.layers.0.z_encoder_attn.v_proj.bias", "decoder.layers.0.z_encoder_attn.q_proj.weight", "decoder.layers.0.z_encoder_attn.q_proj.bias", "decoder.layers.0.z_encoder_attn.out_proj.weight", "decoder.layers.0.z_encoder_attn.out_proj.bias", "decoder.layers.0.z_encoder_attn_layer_norm.weight", "decoder.layers.0.z_encoder_attn_layer_norm.bias", "decoder.layers.1.z_encoder_attn.k_proj.weight", "decoder.layers.1.z_encoder_attn.k_proj.bias", "decoder.layers.1.z_encoder_attn.v_proj.weight", "decoder.layers.1.z_encoder_attn.v_proj.bias", "decoder.layers.1.z_encoder_attn.q_proj.weight", "decoder.layers.1.z_encoder_attn.q_proj.bias", "decoder.layers.1.z_encoder_attn.out_proj.weight", "decoder.layers.1.z_encoder_attn.out_proj.bias", "decoder.layers.1.z_encoder_attn_layer_norm.weight", "decoder.layers.1.z_encoder_attn_layer_norm.bias", "decoder.layers.2.z_encoder_attn.k_proj.weight", "decoder.layers.2.z_encoder_attn.k_proj.bias", "decoder.layers.2.z_encoder_attn.v_proj.weight", "decoder.layers.2.z_encoder_attn.v_proj.bias", "decoder.layers.2.z_encoder_attn.q_proj.weight", "decoder.layers.2.z_encoder_attn.q_proj.bias", "decoder.layers.2.z_encoder_attn.out_proj.weight", "decoder.layers.2.z_encoder_attn.out_proj.bias", "decoder.layers.2.z_encoder_attn_layer_norm.weight", "decoder.layers.2.z_encoder_attn_layer_norm.bias", "decoder.layers.3.z_encoder_attn.k_proj.weight", "decoder.layers.3.z_encoder_attn.k_proj.bias", "decoder.layers.3.z_encoder_attn.v_proj.weight", "decoder.layers.3.z_encoder_attn.v_proj.bias", "decoder.layers.3.z_encoder_attn.q_proj.weight", "decoder.layers.3.z_encoder_attn.q_proj.bias", "decoder.layers.3.z_encoder_attn.out_proj.weight", "decoder.layers.3.z_encoder_attn.out_proj.bias", "decoder.layers.3.z_encoder_attn_layer_norm.weight", "decoder.layers.3.z_encoder_attn_layer_norm.bias", "decoder.layers.4.z_encoder_attn.k_proj.weight", "decoder.layers.4.z_encoder_attn.k_proj.bias", "decoder.layers.4.z_encoder_attn.v_proj.weight", "decoder.layers.4.z_encoder_attn.v_proj.bias", "decoder.layers.4.z_encoder_attn.q_proj.weight", "decoder.layers.4.z_encoder_attn.q_proj.bias", "decoder.layers.4.z_encoder_attn.out_proj.weight", "decoder.layers.4.z_encoder_attn.out_proj.bias", "decoder.layers.4.z_encoder_attn_layer_norm.weight", "decoder.layers.4.z_encoder_attn_layer_norm.bias", "decoder.layers.5.z_encoder_attn.k_proj.weight", "decoder.layers.5.z_encoder_attn.k_proj.bias", "decoder.layers.5.z_encoder_attn.v_proj.weight", "decoder.layers.5.z_encoder_attn.v_proj.bias", "decoder.layers.5.z_encoder_attn.q_proj.weight", "decoder.layers.5.z_encoder_attn.q_proj.bias", "decoder.layers.5.z_encoder_attn.out_proj.weight", "decoder.layers.5.z_encoder_attn.out_proj.bias", "decoder.layers.5.z_encoder_attn_layer_norm.weight", "decoder.layers.5.z_encoder_attn_layer_norm.bias", "decoder.layers.6.z_encoder_attn.k_proj.weight", "decoder.layers.6.z_encoder_attn.k_proj.bias", "decoder.layers.6.z_encoder_attn.v_proj.weight", "decoder.layers.6.z_encoder_attn.v_proj.bias", "decoder.layers.6.z_encoder_attn.q_proj.weight", "decoder.layers.6.z_encoder_attn.q_proj.bias", "decoder.layers.6.z_encoder_attn.out_proj.weight", "decoder.layers.6.z_encoder_attn.out_proj.bias", "decoder.layers.6.z_encoder_attn_layer_norm.weight", "decoder.layers.6.z_encoder_attn_layer_norm.bias", "decoder.layers.7.z_encoder_attn.k_proj.weight", "decoder.layers.7.z_encoder_attn.k_proj.bias", "decoder.layers.7.z_encoder_attn.v_proj.weight", "decoder.layers.7.z_encoder_attn.v_proj.bias", "decoder.layers.7.z_encoder_attn.q_proj.weight", "decoder.layers.7.z_encoder_attn.q_proj.bias", "decoder.layers.7.z_encoder_attn.out_proj.weight", "decoder.layers.7.z_encoder_attn.out_proj.bias", "decoder.layers.7.z_encoder_attn_layer_norm.weight", "decoder.layers.7.z_encoder_attn_layer_norm.bias", "decoder.layers.8.z_encoder_attn.k_proj.weight", "decoder.layers.8.z_encoder_attn.k_proj.bias", "decoder.layers.8.z_encoder_attn.v_proj.weight", "decoder.layers.8.z_encoder_attn.v_proj.bias", "decoder.layers.8.z_encoder_attn.q_proj.weight", "decoder.layers.8.z_encoder_attn.q_proj.bias", "decoder.layers.8.z_encoder_attn.out_proj.weight", "decoder.layers.8.z_encoder_attn.out_proj.bias", "decoder.layers.8.z_encoder_attn_layer_norm.weight", "decoder.layers.8.z_encoder_attn_layer_norm.bias", "decoder.layers.9.z_encoder_attn.k_proj.weight", "decoder.layers.9.z_encoder_attn.k_proj.bias", "decoder.layers.9.z_encoder_attn.v_proj.weight", "decoder.layers.9.z_encoder_attn.v_proj.bias", "decoder.layers.9.z_encoder_attn.q_proj.weight", "decoder.layers.9.z_encoder_attn.q_proj.bias", "decoder.layers.9.z_encoder_attn.out_proj.weight", "decoder.layers.9.z_encoder_attn.out_proj.bias", "decoder.layers.9.z_encoder_attn_layer_norm.weight", "decoder.layers.9.z_encoder_attn_layer_norm.bias", "decoder.layers.10.z_encoder_attn.k_proj.weight", "decoder.layers.10.z_encoder_attn.k_proj.bias", "decoder.layers.10.z_encoder_attn.v_proj.weight", "decoder.layers.10.z_encoder_attn.v_proj.bias", "decoder.layers.10.z_encoder_attn.q_proj.weight", "decoder.layers.10.z_encoder_attn.q_proj.bias", "decoder.layers.10.z_encoder_attn.out_proj.weight", "decoder.layers.10.z_encoder_attn.out_proj.bias", "decoder.layers.10.z_encoder_attn_layer_norm.weight", "decoder.layers.10.z_encoder_attn_layer_norm.bias", "decoder.layers.11.z_encoder_attn.k_proj.weight", "decoder.layers.11.z_encoder_attn.k_proj.bias", "decoder.layers.11.z_encoder_attn.v_proj.weight", "decoder.layers.11.z_encoder_attn.v_proj.bias", "decoder.layers.11.z_encoder_attn.q_proj.weight", "decoder.layers.11.z_encoder_attn.q_proj.bias", "decoder.layers.11.z_encoder_attn.out_proj.weight", "decoder.layers.11.z_encoder_attn.out_proj.bias", "decoder.layers.11.z_encoder_attn_layer_norm.weight", "decoder.layers.11.z_encoder_attn_layer_norm.bias".

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/home/xjw/miniconda3/envs/gsum/bin/fairseq-train", line 33, in
sys.exit(load_entry_point('fairseq', 'console_scripts', 'fairseq-train')())
File "/home/xjw/code/guided_summarization/src/fairseq/fairseq_cli/train.py", line 320, in cli_main
main(args)
File "/home/xjw/code/guided_summarization/src/fairseq/fairseq_cli/train.py", line 81, in main
extra_state, epoch_itr = checkpoint_utils.load_checkpoint(args, trainer)
File "/home/xjw/code/guided_summarization/src/fairseq/fairseq/checkpoint_utils.py", line 134, in load_checkpoint
reset_meters=args.reset_meters,
File "/home/xjw/code/guided_summarization/src/fairseq/fairseq/trainer.py", line 199, in load_checkpoint
"please ensure that the architectures match.".format(filename)
Exception: Cannot load model parameters from checkpoint ./bart/bart.large/model.pt; please ensure that the architectures match.

And for the issue #32, even when I remove --max-sentences 1 like you said, the ZeroDivisionError still exists if I want to train using multi-GPU. Only if I use one GPU, the error disappeared but it's too slow to train such a big model.

Thanks for your kindly help :) @zdou0830

Instructions to generate Keywords

Can you provide us instructions on how to use the scripts for generating keywords?
What format are we supposed to provide the dataset and what are the requirements and steps?

ZeroDivisionError: float division by zero

I met the "ZeroDivisionError: float division by zero" when I want to train the model with multi-gpu. And if only 1 gpu, the problem disappear but the training is too slow...
And the detailed traceback is below, do you have any idea about it?

/home/xjw/code/guided_summarization/src/fairseq/fairseq/optim/adam.py:179: UserWarning: This overload of add_ is deprecated:
add_(Number alpha, Tensor other)
Consider using one of the following signatures instead:
add_(Tensor other, *, Number alpha) (Triggered internally at /pytorch/torch/csrc/utils/python_arg_parser.cpp:1025.)
exp_avg.mul_(beta1).add_(1 - beta1, grad)
/home/xjw/code/guided_summarization/src/fairseq/fairseq/optim/adam.py:179: UserWarning: This overload of add_ is deprecated:
add_(Number alpha, Tensor other)
Consider using one of the following signatures instead:
add_(Tensor other, *, Number alpha) (Triggered internally at /pytorch/torch/csrc/utils/python_arg_parser.cpp:1025.)
exp_avg.mul_(beta1).add_(1 - beta1, grad)
Traceback (most recent call last):
File "/home/xjw/miniconda3/envs/gsum/bin/fairseq-train", line 33, in
sys.exit(load_entry_point('fairseq', 'console_scripts', 'fairseq-train')())
File "/home/xjw/code/guided_summarization/src/fairseq/fairseq_cli/train.py", line 316, in cli_main
nprocs=args.distributed_world_size,
File "/home/xjw/miniconda3/envs/gsum/lib/python3.6/site-packages/torch/multiprocessing/spawn.py", line 230, in spawn
return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
File "/home/xjw/miniconda3/envs/gsum/lib/python3.6/site-packages/torch/multiprocessing/spawn.py", line 188, in start_processes
while not context.join():
File "/home/xjw/miniconda3/envs/gsum/lib/python3.6/site-packages/torch/multiprocessing/spawn.py", line 150, in join
raise ProcessRaisedException(msg, error_index, failed_process.pid)
torch.multiprocessing.spawn.ProcessRaisedException:

-- Process 1 terminated with the following error:
Traceback (most recent call last):
File "/home/xjw/miniconda3/envs/gsum/lib/python3.6/site-packages/torch/multiprocessing/spawn.py", line 59, in _wrap
fn(i, *args)
File "/home/xjw/code/guided_summarization/src/fairseq/fairseq_cli/train.py", line 283, in distributed_main
main(args, init_distributed=True)
File "/home/xjw/code/guided_summarization/src/fairseq/fairseq_cli/train.py", line 102, in main
train(args, trainer, task, epoch_itr)
File "/home/xjw/miniconda3/envs/gsum/lib/python3.6/contextlib.py", line 52, in inner
return func(*args, **kwds)
File "/home/xjw/code/guided_summarization/src/fairseq/fairseq_cli/train.py", line 178, in train
log_output = trainer.train_step(samples)
File "/home/xjw/miniconda3/envs/gsum/lib/python3.6/contextlib.py", line 52, in inner
return func(*args, **kwds)
File "/home/xjw/code/guided_summarization/src/fairseq/fairseq/trainer.py", line 391, in train_step
logging_output = self._reduce_and_log_stats(logging_outputs, sample_size)
File "/home/xjw/code/guided_summarization/src/fairseq/fairseq/trainer.py", line 718, in _reduce_and_log_stats
self.task.reduce_metrics(logging_outputs, self.get_criterion())
File "/home/xjw/code/guided_summarization/src/fairseq/fairseq/tasks/guided_translation.py", line 307, in reduce_metrics
super().reduce_metrics(logging_outputs, criterion)
File "/home/xjw/code/guided_summarization/src/fairseq/fairseq/tasks/fairseq_task.py", line 406, in reduce_metrics
criterion.class.reduce_metrics(logging_outputs)
File "/home/xjw/code/guided_summarization/src/fairseq/fairseq/criterions/label_smoothed_cross_entropy.py", line 95, in reduce_metrics
metrics.log_scalar('loss', loss_sum / sample_size / math.log(2), sample_size, round=3)
ZeroDivisionError: float division by zero

about the wikihow train script

Hi, I have comparable results to those in paper on CNNDM dataset. However, when I run the same train.sh scripts for wikihow dataset, it can't generate right sentences. So I'd appreciate it if you could release the train script for wikihow, namely, the lr(s), warm_up steps......
Thanks!

RuntimeError: masked_select(): self and result must have the same scalar type

Did anyone encounter this error while trying to get an inference from the pretrained model(Bart) ?

(guided2) user@user-Alienware-15-R4:~/Lakshmi/guided_summarization/bart$ bash z_test.sh test.source test.matchsum test.output our_model bart_sentence.pt /home/user/Lakshmi/guided_summarization/binarized

Traceback (most recent call last):
File "z_test.py", line 25, in
hypotheses_batch = bart.sample(slines, zlines, beam=4, lenpen=2.0, max_len_b=140, min_len=55, no_repeat_ngram_size=3, guided=True)
File "/home/user/Lakshmi/guided_summarization/bart/fairseq/models/bart/guided_hub_interface.py", line 125, in sample
hypos = self.generate(input, z, beam, verbose, **kwargs)
File "/home/user/Lakshmi/guided_summarization/bart/fairseq/models/bart/guided_hub_interface.py", line 142, in generate
prefix_tokens=sample['net_input']['src_tokens'].new_zeros((len(tokens), 1)).fill_(self.task.source_dictionary.bos()),
File "/home/user/Lakshmi/guided_summarization/bart/fairseq/tasks/fairseq_task.py", line 354, in inference_step
return generator.generate(models, sample, prefix_tokens=prefix_tokens)
File "/home/user/anaconda3/envs/guided2/lib/python3.7/site-packages/torch/autograd/grad_mode.py", line 28, in decorate_context
return func(*args, **kwargs)
File "/home/user/Lakshmi/guided_summarization/bart/fairseq/sequence_generator.py", line 852, in generate
return self._generate(model, sample, **kwargs)
File "/home/user/anaconda3/envs/guided2/lib/python3.7/site-packages/torch/autograd/grad_mode.py", line 28, in decorate_context
return func(*args, **kwargs)
File "/home/user/Lakshmi/guided_summarization/bart/fairseq/sequence_generator.py", line 1146, in _generate
out=eos_bbsz_idx,
RuntimeError: masked_select(): self and result must have the same scalar type

Plans for entire model release

Is there any timeline for the model code release and other modules which will help me replicate the results in the paper?

How large GPU is expected to train GSum model?

Hi, I have noticed in your script that you use 2 GPU with 1024 tokens per each, and keep batch size as the same as bart to train the model. Can I ask what is the memory for your GPU? I use several 16G gpus which can train the bart but not work for GSum.

Thanks

How to choose number of sentences we selected by oracle?

Hi! Thanks for your wonderful work!
I saw 'sents.py' and found that 'summary_size' in 'greedy_selection' function is the number of sentences we selected by oracle. I wonder why you set this number to 3?

train.bpe.z

how i can get the train.pbe.z files????

Initializing both encoders with pre-trained BART

Hello,
thanks for the great work.
I was wondering if you can point me to the part of code where you initialized the first N layers of both encoders with pre-trained BART and the newly added layer randomly. Thank you in advance

Training stuck at step 72900/200000

Hi,

I noticed that the training on CNN dataset gets stuck at step 72900/200000. However the GPU utilization shows 100%. I tried training 3 times. But every time I am getting stuck at the same step. I tried different datasets and the training gets stuck at the same step.(with GPU utilization at 100%). Have attached the image here for reference. Need your inputs regarding this.
Thanks

Citation in README.md is wrong

highlight sentence data not in order

Hello,

I have realized that the data available in this link is not in order corresponding to each datapoint in dataset. I assumed that each line(data point) in dataset needs to be related to the same line in oracle sentences, but it is not. If so, how to get the ordered data?

Thanks.

trouble running model

I'm getting stuck trying to get the model to run all the files in the data_path. I'm currently getting a IsADirectoryError: [Errno 21] error: "IsADirectoryError: [Errno 21] Is a directory: '/content/guided_summarization/bert/data_path/bert_data_new/cnndm' ".

Can you let me know what I am doing wrong?

Questions about implementation

Hi, thanks for the interesting paper and for releasing your code. I have a couple of questions about the implementation.

I see that you got the input data from the sensim repo and you released the oracle sentences here. How did you sentence-split the input data to generate the oracle sentences?
When the documents and sentence guidance signals are passed to the BART model, you say that they should not be tokenized. Does that also mean it should not be sentence tokenized (i.e., the <q> should be removed)?
Do you have outputs from the models that you can release?

Thanks!

Understanding the scripts for training Guided model

Hello,
I just had a simple doubt, for training the model the command provided is

bash z_train.sh DATA_PATH MODEL_PATH

and for testing the model it is

bash z_test.sh SRC GUIDANCE RESULT_PATH MODEL_DIR MODEL_NAME DATA_BIN

and although the task seems to be guided_summarization in z_train.sh, could you tell me what guidance does the training use, and if I can specify the guidance path?

关于bertext加入guidance的训练

你好，我正在尝试将你bertExt的部分用于我自己的数据集进行训练，但是加入不同的guidance会得到同样的rouge分数，请问是不是你的代码在Presumm的基础上只改写了abs的部分？如果我直接将z_trainer用到ext上可不可行呢？

About the parameter sharing and training

From “forward” functions of GuidedBARTModel and GuidedTransformerEncoder, I can understand how to share parameters in terms of model structure. However, I am confused about how the gradients flow and how to update the shared and unshared parameters. According to the paper, the unshared parameters are trained separately, but in the code, it seems that the source document and guidance signal are input together and the parameters are updated once in a step?
Thanks.

Getting empty results at test time, please help!

First of all, thank you for the code you provided. I downloaded original test and train data from PreSumm repo and then add its corresponding guidance signal through ‘highligted_sentence_data.py’ from bert folder. Everything goes well in training. But at test time, i get empty results. What’s wrong here?

file2rouge problem

Hi, when using file2rouge, i met this issue, do you konw how to fix it?

(testtransformers) [zggao@cu05 result]$ files2rouge test.hypo.tokenized test.target.tokenized
Preparing documents... 0 line(s) ignored
Running ROUGE...
Traceback (most recent call last):
File "/home/zggao/anaconda3/envs/testtransformers/bin/files2rouge", line 33, in
sys.exit(load_entry_point('files2rouge==2.1.0', 'console_scripts', 'files2rouge')())
File "/home/zggao/anaconda3/envs/testtransformers/lib/python3.6/site-packages/files2rouge-2.1.0-py3.6.egg/files2rouge/files2rouge.py", line 105, in main
args.stemming)
File "/home/zggao/anaconda3/envs/testtransformers/lib/python3.6/site-packages/files2rouge-2.1.0-py3.6.egg/files2rouge/files2rouge.py", line 54, in run
stemming=stemming)
TypeError: init() got an unexpected keyword argument 'log_level'

In terms of the released model for cnndm datsets

Hello! I appreciate your sharing this work.

When I used your pre-trained model for cnndm datasets, I obtained the results as follow:

Preparing documents... 0 line(s) ignored
Running ROUGE...

1 ROUGE-1 Average_R: 0.40673 (95%-conf.int. 0.40418 - 0.40929)
1 ROUGE-1 Average_P: 0.55422 (95%-conf.int. 0.55179 - 0.55676)
1 ROUGE-1 Average_F: 0.45788 (95%-conf.int. 0.45563 - 0.46023)

1 ROUGE-2 Average_R: 0.19777 (95%-conf.int. 0.19539 - 0.20017)
1 ROUGE-2 Average_P: 0.26860 (95%-conf.int. 0.26579 - 0.27132)
1 ROUGE-2 Average_F: 0.22213 (95%-conf.int. 0.21968 - 0.22445)

1 ROUGE-L Average_R: 0.37628 (95%-conf.int. 0.37380 - 0.37870)
1 ROUGE-L Average_P: 0.51289 (95%-conf.int. 0.51046 - 0.51537)
1 ROUGE-L Average_F: 0.42369 (95%-conf.int. 0.42153 - 0.42601)

Elapsed time: 158.417 seconds

The scores are slightly lower than the reported scores.
Am I missing something?

Thank you so much again.

Advice on training the model

Hello,

I started training the model on keyword based guidance (oracle keywords) on a 2x 2080 TI system and trained it for around 36 hours, and tested it. The models seems to be consistently generating gibberish output. (The model trained for about 17 epochs until this stage(

Do you have any advice on how many epochs the model needs to converge? and if this is the same for sentence guided signals as well? Do you have any other advice on model training?

Empty results during test time, Please help!

First of all, thank you for the code you provided. I checked the folder code for BERT, but did not find any code snippet that generates the signals at test time, using the learned extractive model. Finally I added the following two functions to data_builder.py to extract the sentences during test time. I am able to generate the test data correctly. But at the test time, no results are generated and the results files are empty. What’s wrong with code? Is there an alternative way to generate test data?
Thank you for your consideration

def custom_ext_format_to_bert(args):
    if (args.dataset != ''):
        datasets = [args.dataset]
        print('dataset')
    else:
        datasets = ['train']
    for corpus_type in datasets:
        a_lst = []
        print('.' + corpus_type + '.0.json')
        for json_f in glob.glob(args.raw_path + '*' + corpus_type + '_[0-9]*.story.json'):
            print(json_f)
            real_name = json_f.split('/')[-1]
            print(real_name)
            a_lst.append((corpus_type, json_f, args, pjoin(args.save_path, real_name.replace('json', 'bert.pt'))))
        print(a_lst)
        pool = Pool(args.n_cpus)
        for d in pool.imap(_ext_format_to_bert, a_lst):
            pass

        pool.close()
        pool.join()

def _ext_format_to_bert(params):
    corpus_type, json_file, args, save_file = params
    is_test = corpus_type == 'test'
    if (os.path.exists(save_file)):
        logger.info('Ignore %s' % save_file)
        return

    bert = BertData(args)

    logger.info('Processing %s' % json_file)
    jobs = json.load(open(json_file))
    datasets = []
    # Load extractive model
    model_type = 'bertbase' #@param ['bertbase', 'distilbert', 'mobilebert']
    checkpoint = torch.load('models/bertext_cnndm_transformer.pt', map_location='cpu')
    model = ExtSummarizer(checkpoint=checkpoint, bert_type=model_type, device='cpu')
    for d in jobs:
        source, tgt = d['src'], d['tgt']
        processed_text, full_length = preprocess(source)
        input_data = load_text(processed_text, max_pos=512, device="cpu")
        sent_labels = get_labels(model, input_data, max_length=2, block_trigram=True)
        if(sent_labels is None): continue
        # sent_labels = greedy_selection(source[:args.max_src_nsents], tgt, 3)

        if (args.lower):
            source = [' '.join(s).lower().split() for s in source]
            tgt = [' '.join(s).lower().split() for s in tgt]
        b_data = bert.preprocess(source, tgt, sent_labels, is_test=is_test)
        # b_data = bert.preprocess(source, tgt, sent_labels)

        if (b_data is None):
            continue
        src_subtoken_idxs, sent_labels, tgt_subtoken_idxs, segments_ids, cls_ids, src_txt, tgt_txt = b_data
        b_data_dict = {"src": src_subtoken_idxs, "tgt": tgt_subtoken_idxs,
                       "src_sent_labels": sent_labels, "segs": segments_ids, 'clss': cls_ids,
                       'src_txt': src_txt, "tgt_txt": tgt_txt}
        datasets.append(b_data_dict)
    logger.info('Processed instances %d' % len(datasets))
    logger.info('Saving to %s' % save_file)
    torch.save(datasets, save_file)
    datasets = []
    gc.collect()

ImportError: Please build Cython components with: `pip install --editable .` or `python setup.py build_ext --inplace`

2021-06-08 23:33:42 | INFO | fairseq_cli.train | model guided_bart_large, criterion LabelSmoothedCrossEntropyCriterion
2021-06-08 23:33:42 | INFO | fairseq_cli.train | num. model params: 469292032 (num. trained: 469292032)
2021-06-08 23:33:45 | INFO | fairseq_cli.train | training on 1 GPUs
2021-06-08 23:33:45 | INFO | fairseq_cli.train | max tokens per GPU = 2048 and max sentences per GPU = None
2021-06-08 23:33:45 | INFO | fairseq.trainer | no existing checkpoint found /projects/tir5/users/pliu3/zdou/fairseq/bart.large/model.pt
2021-06-08 23:33:45 | INFO | fairseq.trainer | loading train data for epoch 0
2021-06-08 23:33:46 | INFO | fairseq.data.data_utils | loaded 287227 examples from: /data/songpeng/project_my/guided_summarization/bart/data-bin/cnn_dm-bin/train.source-target.source
2021-06-08 23:33:47 | INFO | fairseq.data.data_utils | loaded 287227 examples from: /data/songpeng/project_my/guided_summarization/bart/data-bin/cnn_dm-bin/train.source-target.target
2021-06-08 23:33:47 | INFO | fairseq.data.data_utils | loaded 287227 examples from: /data/songpeng/project_my/guided_summarization/bart/data-bin/cnn_dm-bin/train.source-target.z
2021-06-08 23:33:47 | INFO | fairseq.tasks.guided_translation | /data/songpeng/project_my/guided_summarization/bart/data-bin/cnn_dm-bin/ train source-target 287227 examples
2021-06-08 23:33:49 | WARNING | fairseq.data.data_utils | 4 samples have invalid sizes and will be skipped, max_positions=(1024, 1024, 1024), first few sample ids=[189447, 112053, 286032, 172051]
Traceback (most recent call last):
File "/data/songpeng/project_my/guided_summarization/bart/fairseq/data/data_utils.py", line 221, in batch_by_size
from fairseq.data.data_utils_fast import batch_by_size_fast
ModuleNotFoundError: No module named 'fairseq.data.data_utils_fast'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "train.py", line 11, in
cli_main()
File "/data/songpeng/project_my/guided_summarization/bart/fairseq_cli/train.py", line 318, in cli_main
main(args)
File "/data/songpeng/project_my/guided_summarization/bart/fairseq_cli/train.py", line 81, in main
extra_state, epoch_itr = checkpoint_utils.load_checkpoint(args, trainer)
File "/data/songpeng/project_my/guided_summarization/bart/fairseq/checkpoint_utils.py", line 152, in load_checkpoint
epoch_itr = trainer.get_train_iterator(
File "/data/songpeng/project_my/guided_summarization/bart/fairseq/trainer.py", line 275, in get_train_iterator
return self.task.get_batch_iterator(
File "/data/songpeng/project_my/guided_summarization/bart/fairseq/tasks/fairseq_task.py", line 176, in get_batch_iterator
batch_sampler = data_utils.batch_by_size(
File "/data/songpeng/project_my/guided_summarization/bart/fairseq/data/data_utils.py", line 223, in batch_by_size
raise ImportError(
ImportError: Please build Cython components with: pip install --editable . or python setup.py build_ext --inplace
srun: error: pgpu10: task 0: Exited with exit code 1

Are there some errors in my fairseq?

关于论文的小疑问

大佬你好！感谢你的分享，受益匪浅！
我有点没看懂这个结果。请问这个Oracle结果是在测试集上使用Oracle生成signals吗？个人感觉在测试集上使用标签来生成guidance signal有一些牵强。。。
如果是测试集的Oracle的话，请问有没有在训练时分别使用Oracle和automatic predicted的结果对比呀。

不好意思打扰你

Add MIT License to top directory

Hi @zdou0830 : sorry to bug you with little things, but there's an MIT license in the bart and bert directories but not the top directory, so the license of the "scripts" directory is not clear. Could you add the MIT license to the top directory too?

关于代码的问题

您好，我在运行BERT部分代码的时候，将mode参数设为Oracle时这个函数会因为cal_lead为FALSE，selected_ids未被初始化而报错，请问lead、和oracle分别对应您论文中的哪部分呢，方便的话请您告知，非常感谢
` def test(self, test_iter, step, cal_lead=False, cal_oracle=False):

    # Set model in validating mode.
    def _get_ngrams(n, text):
        ngram_set = set()
        text_length = len(text)
        max_index_ngram_start = text_length - n
        for i in range(max_index_ngram_start + 1):
            ngram_set.add(tuple(text[i:i + n]))
        return ngram_set

    def _block_tri(c, p):
        tri_c = _get_ngrams(3, c.split())
        for s in p:
            tri_s = _get_ngrams(3, s.split())
            if len(tri_c.intersection(tri_s))>0:
                return True
        return False

    if (not cal_lead and not cal_oracle):
        self.model.eval()
    stats = Statistics()

    can_path = '%s_step%d.candidate'%(self.args.result_path,step)
    gold_path = '%s_step%d.gold' % (self.args.result_path, step)
    with open(can_path, 'w') as save_pred:
        with open(gold_path, 'w') as save_gold:
            with torch.no_grad():
                for batch in test_iter:
                    gold = []
                    pred = []
                    if (cal_lead):
                        selected_ids = [list(range(batch.clss.size(1)))] * batch.batch_size
                    for i, idx in enumerate(selected_ids):
                        _pred = []
                        if(len(batch.src_str[i])==0):
                            continue
                        for j in selected_ids[i][:len(batch.src_str[i])]:
                            if(j>=len( batch.src_str[i])):
                                continue
                            candidate = batch.src_str[i][j].strip()
                            _pred.append(candidate)

                            if ((not cal_oracle) and (not self.args.recall_eval) and len(_pred) == 3):
                                break`

About the model architecture of Bart application mentioned in the paper

Hi, what part of your code is the definition of Bart's model architecture, because I don't know much about fairseq. In addition, Bart is an encoder-decoder structure. How do you apply it to the model mentioned in your paper.

测试rouge评分很低，预测的句子全是<q>

Oracle summary

How to generate highlight sentence data for Bart model ?

The one you shared in the Bart folder generated using the below function ?

guided_summarization/bert/prepro/data_builder.py

Line 161 in a8956df

def greedy_selection(doc_sent_list, abstract_sent_list, summary_size):

Would you provide the output file of your models on XSUM and CNNDM

Could you release the trained BERT-based models?

Thanks!

AttributeError: 'GuidedTranslationTask' object has no attribute 'args'

2021-06-10 00:18:28 | INFO | fairseq_cli.train | task: GuidedTranslationTask
2021-06-10 00:18:28 | INFO | fairseq_cli.train | model: GuidedBARTModel
2021-06-10 00:18:28 | INFO | fairseq_cli.train | criterion: LabelSmoothedCrossEntropyCriterion
2021-06-10 00:18:28 | INFO | fairseq_cli.train | num. shared model params: 469,292,032 (num. trained: 469,292,032)
2021-06-10 00:18:28 | INFO | fairseq_cli.train | num. expert model params: 0 (num. trained: 0)
Traceback (most recent call last):
File "/home/songpeng/.conda/envs/fairseq/bin/fairseq-train", line 33, in
sys.exit(load_entry_point('fairseq', 'console_scripts', 'fairseq-train')())
File "/data/songpeng/project_my/fairseq/fairseq_cli/train.py", line 507, in cli_main
distributed_utils.call_main(cfg, main)
File "/data/songpeng/project_my/fairseq/fairseq/distributed/utils.py", line 369, in call_main
main(cfg, **kwargs)
File "/data/songpeng/project_my/fairseq/fairseq_cli/train.py", line 124, in main
task.load_dataset(valid_sub_split, combine=False, epoch=1)
File "/data/songpeng/project_my/fairseq/fairseq/tasks/guided_translation.py", line 253, in load_dataset
paths = utils.split_paths(self.args.data)
AttributeError: 'GuidedTranslationTask' object has no attribute 'args'

guided_summarization

Hello, why did the accuracy drop sharply last night at 3000 steps when training Bert z_abs, and then stay between 4% and 5%?

Can you provide the decode result of oracle model?

There is no output of the oracle model in this link:https://drive.google.com/drive/folders/1R5ReFS3bmE3baizBzYqQUB6F7zH-O_xk. Thanks a lot!

can't test when on bart

Hi, thanks for the nice work.
when i tried to follow the instructions to predict using the provided model, i have such an error:

(testtransformers) [zggao@cu05 bart]$ bash z_test.sh ./cnn_dm/test.source ./bart_sentence_guide_cnndm/test.matchsum cnndm.test.output ./ bart_sentence.pt cnndm-bin-z
Traceback (most recent call last):
File "z_test.py", line 8, in
data_name_or_path=sys.argv[6]
File "/home/zggao/document-summarization/GSum/guided_summarization-master/bart/fairseq/models/bart/guided_model.py", line 169, in from_pretrained
**kwargs,
File "/home/zggao/document-summarization/GSum/guided_summarization-master/bart/fairseq/hub_utils.py", line 73, in from_pretrained
arg_overrides=kwargs,
File "/home/zggao/document-summarization/GSum/guided_summarization-master/bart/fairseq/checkpoint_utils.py", line 200, in load_model_ensemble_and_task
task = tasks.setup_task(args)
File "/home/zggao/document-summarization/GSum/guided_summarization-master/bart/fairseq/tasks/init.py", line 17, in setup_task
return TASK_REGISTRY[args.task].setup_task(args, **kwargs)
File "/home/zggao/document-summarization/GSum/guided_summarization-master/bart/fairseq/tasks/guided_translation.py", line 227, in setup_task
paths = utils.split_paths(args.data)
File "/home/zggao/document-summarization/GSum/guided_summarization-master/bart/fairseq/utils.py", line 30, in split_paths
return paths.split(os.pathsep) if "://" not in paths else paths.split("|")
TypeError: argument of type 'NoneType' is not iterable

Question about keyword.py

I was wondering why you separate the keywords with the word [SEP]?

out.write(' [SEP] '.join(new_words)+'\n')

I am mostly asking because if I am understanding the code correctly the [SEP] is between any two keywords and not keywords from different sentences.

How to Generate Other Guidance Signals?

Hello,

I see that you have provided a way for generating sentence guidance signals through the BERText Model, Could you give some pointers for generating other guidance keywords, for eg topic?

Choices of Guidance Signals

Hi @zdou0830
I am reading about signal handling and how signals pair to the model. But I don't see any guidelines for keyword signals.
Can you show me how to process the keyword signal and put it in the model.
Thank you

How to get guidance.pt

guided_summarization/bert/example_add_guidance.py

Line 13 in a8956df

zs=torch.load(f'../{mode}.guidance.pt')

关于bart

bert 是 encoder 后面 transformers decoder 作为decoder bart 是seq2seq 后面还需要 transformers decoder 作为decoder 吗

The acc is small

Hi, thanks for the nice work.
When i run bert, i met the problem that the acc is too small, i follow the readme instructions to process my data, do you know why?

Error(s) in loading state_dict for GuidedBARTModel

Hi,
Trying to load the sentence-guided BART checkpoint provided here gives the following error:

raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for GuidedBARTModel:
        size mismatch for encoder.embed_tokens.weight: copying a param with shape torch.Size([50264, 1024]) from checkpoint, the shape in current model is torch.Size([4, 1024]).
        size mismatch for decoder.embed_tokens.weight: copying a param with shape torch.Size([50264, 1024]) from checkpoint, the shape in current model is torch.Size([4, 1024]).

Code for loading the checkpoint:
bart = GuidedBARTModel.from_pretrained(Path("../../checkpoints/"), "bart_sentence.pt")

Can you kindly tell me what am I doing wrong? Thanks.

about nyt preprocessing!

In the paper, you have mentioned you followed the preprocessing step by Kedzie 2018.

Did you concatenate the two gold summaries and dropped words less than 100?

Because I followed it and obtained little bit different scores using BART model.

R-1-F. R-2-F. R-L-F
57.75. 38.03. 42.19

The reported scores in your paper are
R-1-F. R-2-F. R-L-F
54.13. 35.15. 47.00

I wonder how you preprocess datasets.

Thank you so much!

请问大佬，在测试时BertAbs 怎么添加BertExt生成的指导信号

谢谢您的回复！

How to train the model on Pubmed summarization dataset?

Thank you for sharing your code. I notice that "max_pos" in "bert/train.sh" is set to 512, is it enough for Pubmed? Could you tell me how to train the model on Pubmed?

Are we using oracle or automatic 'z' guidance signal at test time?

guided_summarization/bert/highligted_sentence_data.py

Line 13 in f8cbfe4

elif mode == 'test':

Hi, when I'm trying to reimplement the results on CNN/DM dataset, I find that the scripts above add the oracle guidance signals into processed CNN/DM data at both train and test time. However, according to the paper, we should use "pretrained extractive summarization models (BertExt or MatchSum) to perform automatic prediction at test time".
Does that mean I should first use pretrained BertExt to generate the 'z' guidance signal for the test dataset? or i miss something.
Thanks.

Applying multiple guidance signals

Hello. Thank you very much for sharing the repository and great research.
If I want to use multiple guidance signals, such as, keywords and important sentences at the same time, do I need to use them in the same file 'z'? If so how it could be? I assume that for each data point that are in one line we have a guidance signal in the same line of that datapoint. What if we have two guidance signals? How could I save them in the same line?

Thanks

neulab / guided_summarization Goto Github PK

guided_summarization's People

Contributors

Stargazers

Watchers

Forkers

guided_summarization's Issues

Recommend Projects

Recommend Topics

Recommend Org