Giter VIP home page Giter VIP logo

brio's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

brio's Issues

MIT lincense?

Hello, I'm trying to use your results and codes in our team's research (for comparison only), but I'm not sure about what's possible and what I can't.

Could you please specify which license is used for this project? e.g. MIT, Apache, ...

Thank you.

About the scoring mode

Hi,
thank you for the good work!
I am re-implementing BRIO and I am a little bit confused about the 'score mode' of the model. I mean, how does the scoring mechanism work, given several candidate summaries and corresponding source text? According to my understanding, the output of the model should be a sequence of token ids. Does 'the score of one summary' or 'the probability to generate one summary' mean the similarity between the output and candidate summary?
Thank you for your patience!

Unable to generate summary when initializing model with PyTorch

Hello, I've tried initializing the model using the provided example in the README:

model = BRIO('Yale-LILY/brio-cnndm-uncased', tok.pad_token_id, is_pegasus=False)

However I've been facing some issues when trying to use it for inference:

  1. I keep getting issues that parameters used by .generate() method have value of None. I've tried just putting some default values as shown here. Here's how it looks :
inputs = tokenizer([article], max_length=max_length, return_tensors="pt", truncation=True)
summary_ids = model.generate(inputs["input_ids"],
                                     early_stopping=False,
                                     max_length=1024,
                                     num_beams=1,
                                     num_beam_groups=1)
  1. This brings me to my second issue, where the .generate() method does not create any ids. Whenever I try to decode the generated summary, I get the error TypeError: 'NoneType' object is not iterable. When checking the type or content of summary_ids, I get None or <class 'NoneType'>.

Why is this happening? When loading the pre-trained models straight from HF, I don't have any issues but this one does not seem to be working.

HuggingFace Tokenizer Loading

Hi Yixin - thanks for sharing the repo and putting the pre-trained models on HuggingFace.

Unfortunately, though, I'm having trouble loading the tokenizer for CNN/DM. Any thoughts / suggestions? Thanks

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/griffin/sum/lib/python3.8/site-packages/transformers/models/auto/tokenization_auto.py", line 546, in from_pretrained
    return tokenizer_class_fast.from_pretrained(pretrained_model_name_or_path, *inputs, **kwargs)
  File "/home/griffin/sum/lib/python3.8/site-packages/transformers/tokenization_utils_base.py", line 1788, in from_pretrained
    return cls._from_pretrained(
  File "/home/griffin/sum/lib/python3.8/site-packages/transformers/tokenization_utils_base.py", line 1923, in _from_pretrained
    tokenizer = cls(*init_inputs, **init_kwargs)
  File "/home/griffin/sum/lib/python3.8/site-packages/transformers/models/bart/tokenization_bart_fast.py", line 171, in __init__
    super().__init__(
  File "/home/griffin/sum/lib/python3.8/site-packages/transformers/tokenization_utils_fast.py", line 110, in __init__
    fast_tokenizer = TokenizerFast.from_file(fast_tokenizer_file)
Exception: No such file or directory (os error 2)```

Conflicting between mle and ranking loss during training

Hi @yixinL7, I was training a BRIO model on a different dataset (RedditTIFU) and observed conflicting trends between the mle and ranking loss. I start from a converged mle checkpoint. As the training proceeds, on the validation, the mle loss keeps increasing while the ranking loss decreases normally. In terms of raw value, the mle loss increased from 2.34 -> 2.5 and the ranking loss decreased from 1.63 -> 0.88 after 130k steps. I set the coefficient of ranking loss as 1.0 (setting to higher values e.g. 100 leads to instability).

After training, I picked the one with lowest validation loss (mle+rank combined), and saw that the generation performance significantly degraded (around half of the original model), although the coordination property seemed to have been learned (score improves with higher beam size, but still far from the raw mle checkpoint). Do you still have the log files from back then ? I was trying to figure out what was the cause. It would be helpful to receive your opinions on this !

Why do lower case for BARTCNN?

Hi thank you for sharing your awesome work.
However i notice that you do lower case before tokenize the CNNDM data while using BRIO-CNN (based on BART)

I check the vocab in origin BART and your BRIO vocab(download from transformer model lib)
They all contain upper cased tokens.

Thank you for your reply

Apply BRIO to other generation tasks

Hi, thanks for this fantastic work.
Here is my question: I try to use BRIO in another generation task and re-implement it in Fairseq. However, I find that the performance is relatively poor after incorporating BRIO.
I look further into the generation results and find that many results are just a single period. Moreover, the distribution of scores of candidates seems to be isotropic after training with the contrastive loss (I set the hyper-parameters following the CNN setting in your paper), such as the example shown below:

before training with the contrastive loss (16 candidates, sorted):
[-0.2314, -0.2862, -0.2660, -0.2471, -0.2442, -0.2796, -0.2611, -0.2617, -0.2608, -0.2984, -0.2622, -0.5395, -0.5655, -0.4688, -0.5250, -0.5317],

after:
[-1.1421, -1.1402, -1.1290, -1.1524, -1.1554, -1.1483, -1.1415, -1.1476, -1.1527, -1.1472, -1.1538, -1.1437, -1.1555, -1.1722, -1.1440, -1.1427]

Can you give me any advice?

questions about ROUGE score

Hello!
Thank you for releasing the code as an open source. It's very helpful to study "abstractive summarization"

I have two questions as follows:

  1. Does model_ranking.bin mean BRIO-Ctr of paper? Please check if it is correct
  • model_ranking.bin => BRIO-Ctr of paper
  • model_generation.bin => BRIO-Mul of paper

  1. After downloading model_generation.bin file of XSum dataset, I found the ROUGE score on XSum was measured slightly lower than the paper. Could you tell me why this happened?

▶ ROUGE score on XSum (I made it)
issue_1

▶ code

# evaluate the model as a generator on XSum
python main.py --cuda --gpuid 0 --config xsum -e --model_pt xsum_paper/model_generation.bin -g

# calculate ROUGE score 
python cal_rouge.py --ref ./xsum/diverse/test.target.tokenized --hyp ./result/xsum_paper/test.out.tokenized -l

issue_2

Ranking Loss Question

Hi - Thanks for the great code. I've been trying to re-implement BRIO in my HuggingFace fork, but unable to get it to work.

I'm curious what this line in RankingLoss is doing:

TotalLoss = loss_func(score, score, ones)

One possibility is that I haven't yet included the gold reference as part of the ranking loss, which might explain why the contrast loss is causing the gold standard MLE loss to rise too highly. I will add that but was also curious about the above function. Thank you!!

GPU usage increasing as training progresses

Hi,

Thank you for the good work.

  1. What is the GPU size that is required to train this model?
  2. I am currently using eight 32 GB GPUs to train the model. The memory usage increases as training progresses and ultimately crosses 32 GB causing GPU overflow. Is there a workaround for this? I see that the code has del commands to remove tensors that are no longer needed. Is there anything else that also needs to be deleted?

Thank you!

Running too slow when using gen_candidate.py to generate candidates

I use the summary model trained based on pegasus-base. On the RTX 8000 GPU card, it takes several seconds for each row of training data to generate candidates, which is too slow.
When I need to generate candidate summaries for 5,000,000 data, it takes a very long time.
How to speed up the generation of candidates?

About training time

Hi,

Thank you for the good work.

As you stated in "Implementation Details": "We use 4 NVIDIA RTX 3090 GPUs for the model training, and the average running time for one epoch is around 20 hours." What is the actual time for the overall training of your entire model? Because you set epochs to 100 in the config file (eg xsum dataset), it will take a very long time(20h*100?) to train these epochs.

Thank you very much!

Model checkpoints for nyt

Hi Yixin,
Excellent work! I was wondering if you will also release the checkpoints for the NYT model?

Thanks,
Tanya

A small trick for memory efficiency

Hi, in this part,

BRIO/modeling_bart.py

Lines 1863 to 1869 in 135f0e5

if self.is_scoring_mode:
cand_num = decoder_input_ids.size(1)
encoder_hidden_states = encoder_outputs[0]
encoder_hidden_states = torch.repeat_interleave(encoder_hidden_states, cand_num, dim=0)
attention_mask = torch.repeat_interleave(attention_mask, cand_num, dim=0)
decoder_input_ids = decoder_input_ids.view(-1, decoder_input_ids.size(-1))
decoder_attention_mask = decoder_attention_mask.view(-1, decoder_attention_mask.size(-1))

since the encoder_hidden_states and attention_mask won't be changed in the decoder, a new view for them is more memory efficient than repeat_interleave. Because repeat operation in pytorch would copy the data storage as illustrated
image
using index_select with proper index will be better:
draft ipynb — graph_sum_reranker  SSH: 45a3159k71 zicp vip  2022-06-28 14-30-22
so a simple modification is :

if self.is_scoring_mode:
        batch_size,cand_num,_ = decoder_input_ids.shape
        encoder_hidden_states = encoder_outputs[0]
        expanded_return_idx = torch.arange(batch_size).view(-1,1).repeat(1,cand_num).view(-1).to(encoder_hidden_states.device)
        encoder_hidden_states = encoder_hidden_states.index_select(0,expanded_return_idx)
        attention_mask = attention_mask.index_select(0,expanded_return_idx)
        decoder_input_ids = decoder_input_ids.view(-1, decoder_input_ids.size(-1))
        decoder_attention_mask = decoder_attention_mask.view(-1, decoder_attention_mask.size(-1))

How to calculate BERTScore in the paper

from evaluate import load
bertscore = load("bertscore",cache_dir="../cache")
bertscore.compute(predictions=predictions, references=references, lang="en")
I want to ask why the BERTScore in the paper is only about 30, while I use this code to calculate about 90

Params gold_weight is always 0?

Hello, Thanks for the great code.
I found that params gold_weight is always 0 when training. What is gold summary loss designed for? Could I regardless it during training?
image

Create raw_data structure in example folder from input

Hello, I want to use your SimCLS repo for a summarization task, thanks for its performance. And I know that I should create a structure like your example, include test.source, test.source.tokenized, test.out, ....
Follow the instruction to gen candidate, I can gen test.out file from input, but I wonder how can I gen test.target from my input
Thank you (I'm newbie)

About cased cnn daily mail dataset

Thank you for sharing the great work. Are there any hyper-parameters that should be changed when using the cased dataset? I only changed the dataset and the pre-trained model, but got even poorer results compared to the uncased model. By the way, I'm wondering why the article part in the cased data is still uncased like the photo below.
截屏2022-11-01 10 11 13

Question about training loss settings

Hi,
Thanks for your work.
I am wondering if I can train a BART model first by removing ranking loss.
Will simply setting the rank_weight to 0 work for that?

Dataset issue

Hello,

Thank you for the great & well-organized repository!
It seems that test.source and test.target files in the zipped provided XSum directory are identical.

There is a bug in the Preprocessed cnndm dataset

when I download your Preprocessed cnndm data , I found that there is a problem that the text and the summary do not correspond .
Here is an example (cnndm_cased val 10002.json):
article:Trying to find a way to explain the birds and the bees to children can be a difficult task for any parent . So luckily for this father , a pair of raccoons took it upon themselves to make his job a little easier by giving a little demonstration in the garden . The hilarious footage captured in Seattle begins innocently enough with some excited children looking out of the window at two raccoons scaling their fence . The children watch on excitedly as the male raccoon chases the female from the fence and into the garden . As the youngsters speculate about whether they will jump from the fence -- before a little muffin man ' song interlude -- one of the raccoons descends into the garden closely pursued by the other . One child , sensing the tension , asks : Can raccoons fight ?
abstract: Carol Woodle is one of the most sought after celebrity look-a-likes in the business . Today Carol , 59 , appears as Oprah at 90th birthday parties , corporate events and women 's shelters . At one appearance she gave away iPads like Oprah 's famous car giveaways saying : ` You get an iPad and you get an iPad and YOU get an iPad . But after her first husband walked out on her and their three young boys she thought her life was over . I was so low I was even hospitalized for 30 days due to malnutrition and depression ' Being Oprah allowed her to put the boys ' through college .

Modified gen_candidate.py

Hi, I ask one question about gen_candidate.py code. When I perform that code on 1 NVIDIA RTX 3090 GPU, the runtime error (CUDA out of memory) happened. Therefore, I modified the code as below, is this okay?

▶ Before (gen_candidate.py 83 ~ 85 lines)

with torch.no_grad():
     batch = tok.prepare_seq2seq_batch(src_texts=slines, return_tensors="pt").to(device)
     gen = model.generate(**batch, num_return_sequences=128, num_beam_groups=16, diversity_penalty=0.1, num_beams=128, length_penalty=0.6)
     dec: List[str] = tok.batch_decode(gen, skip_special_tokens=True)

▶ After (gen_candidate.py 83 ~ 85 lines)

max_length = 64
with torch.no_grad():
     batch = tok.prepare_seq2seq_batch(src_texts=slines, return_tensors="pt", max_length=max_length, pad_to_max_length=True, truncation=True).to(device)
     gen = model.generate(**batch, num_return_sequences=128, num_beam_groups=16, diversity_penalty=0.1, num_beams=128, length_penalty=0.6)
     dec: List[str] = tok.batch_decode(gen, skip_special_tokens=True)

Using BRIO for text summarization on another language

Good day Mr. Yixin, thank you so much for your work.
I'm currently trying to use pretrained MBart model as baseline and custom dataset in order to implement BRIO on Vietnamese. I replaced crucial things like using mbart_modeling.py, changing BartScorer to MBartScorer, use suitable tokenizer,... but the ROUGE scores during training was quite poor:
image
I tried to config three most sensitive parameters like you suggested in this post #9 but to no avail. The MLE avg loss is always around 6.0 at the beginning of training:
image
Can you give me some advice?

environment.yml needed

Hi, you have done amazing effort on this and i want to reproduce not using linux-64 environment so can you please include environement.yml using conda env export --from-history for me to try it on other platform? thank you so much!!

Specified model filename pattern was: #ID#.ref

Hi, thanks for this fantastic work.
Here is my question:
when I do "Example: evaluating the model as a generator on CNNDM"

calculate the ROUGE scores using ROUGE Perl Package

python cal_rouge.py --ref ./cnndm/test.target.tokenized --hyp ./result/cnndm/test.out.tokenized -l

it tell me :
Rouge155.py line 497, in __get_model_filenames_for_id
raise Exception(
Exception: Could not find any model summaries for the system summary with ID 10. Specified model filename pattern was: #ID#.ref

How to solve this problem?Thank you!

RuntimeError: CUDA out of memory when running training command

Hi,
I installed your source code successfully and I run evaluation command to produce evaluate score but I want to training brio model on cnndm dataset by following command
python main.py --cuda --gpuid 0 --config cnndm -l
And it log out to terminal like bellow:
`
2022-07-20 07:28:34.137223: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
Namespace(accumulate_step=1, adding=0, batch_size=1, config='cnndm', cuda=True, dataset='cnndm', datatype='diverse', do_generation=False, do_reranking=False, do_sample=True, epoch=100, eval_interval=1000, evaluate=False, gen_max_len=140, gen_min_len=55, gold_margin=0, gold_weight=0, gpuid=[0], grad_norm=0, is_pegasus=False, length_penalty=2.0, log=True, margin=0.001, max_len=120, max_lr=0.002, max_num=16, mle_weight=0.1, model_pt='', model_type='facebook/bart-large-cnn', no_gold=False, normalize=True, num_beams=4, port=12355, pretrained=None, rank_weight=10, report_freq=100, scale=1, score_mode='log', seed=970903, smooth=0.1, total_len=128, warmup_steps=10000)

BRIO(
(model): BartScorer(
(model): CustomBartModel(
(shared): Embedding(50264, 1024, padding_idx=1)
(encoder): BartEncoder(
(embed_tokens): Embedding(50264, 1024, padding_idx=1)
(embed_positions): BartLearnedPositionalEmbedding(1026, 1024)
(layers): ModuleList(
(0): BartEncoderLayer(
(self_attn): BartAttention(
(k_proj): Linear(in_features=1024, out_features=1024, bias=True)
(v_proj): Linear(in_features=1024, out_features=1024, bias=True)
(q_proj): Linear(in_features=1024, out_features=1024, bias=True)
(out_proj): Linear(in_features=1024, out_features=1024, bias=True)
)
(self_attn_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
(activation_fn): GELUActivation()
(fc1): Linear(in_features=1024, out_features=4096, bias=True)
(fc2): Linear(in_features=4096, out_features=1024, bias=True)
(final_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
)
(1): BartEncoderLayer(
(self_attn): BartAttention(
(k_proj): Linear(in_features=1024, out_features=1024, bias=True)
(v_proj): Linear(in_features=1024, out_features=1024, bias=True)
(q_proj): Linear(in_features=1024, out_features=1024, bias=True)
(out_proj): Linear(in_features=1024, out_features=1024, bias=True)
)
(self_attn_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
(activation_fn): GELUActivation()
(fc1): Linear(in_features=1024, out_features=4096, bias=True)
(fc2): Linear(in_features=4096, out_features=1024, bias=True)
(final_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
)
(2): BartEncoderLayer(
(self_attn): BartAttention(
(k_proj): Linear(in_features=1024, out_features=1024, bias=True)
(v_proj): Linear(in_features=1024, out_features=1024, bias=True)
(q_proj): Linear(in_features=1024, out_features=1024, bias=True)
(out_proj): Linear(in_features=1024, out_features=1024, bias=True)
)
(self_attn_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
(activation_fn): GELUActivation()
(fc1): Linear(in_features=1024, out_features=4096, bias=True)
(fc2): Linear(in_features=4096, out_features=1024, bias=True)
(final_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
)
(3): BartEncoderLayer(
(self_attn): BartAttention(
(k_proj): Linear(in_features=1024, out_features=1024, bias=True)
(v_proj): Linear(in_features=1024, out_features=1024, bias=True)
(q_proj): Linear(in_features=1024, out_features=1024, bias=True)
(out_proj): Linear(in_features=1024, out_features=1024, bias=True)
)
(self_attn_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
(activation_fn): GELUActivation()
(fc1): Linear(in_features=1024, out_features=4096, bias=True)
(fc2): Linear(in_features=4096, out_features=1024, bias=True)
(final_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
)
(4): BartEncoderLayer(
(self_attn): BartAttention(
(k_proj): Linear(in_features=1024, out_features=1024, bias=True)
(v_proj): Linear(in_features=1024, out_features=1024, bias=True)
(q_proj): Linear(in_features=1024, out_features=1024, bias=True)
(out_proj): Linear(in_features=1024, out_features=1024, bias=True)
)
(self_attn_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
(activation_fn): GELUActivation()
(fc1): Linear(in_features=1024, out_features=4096, bias=True)
(fc2): Linear(in_features=4096, out_features=1024, bias=True)
(final_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
)
(5): BartEncoderLayer(
(self_attn): BartAttention(
(k_proj): Linear(in_features=1024, out_features=1024, bias=True)
(v_proj): Linear(in_features=1024, out_features=1024, bias=True)
(q_proj): Linear(in_features=1024, out_features=1024, bias=True)
(out_proj): Linear(in_features=1024, out_features=1024, bias=True)
)
(self_attn_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
(activation_fn): GELUActivation()
(fc1): Linear(in_features=1024, out_features=4096, bias=True)
(fc2): Linear(in_features=4096, out_features=1024, bias=True)
(final_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
)
(6): BartEncoderLayer(
(self_attn): BartAttention(
(k_proj): Linear(in_features=1024, out_features=1024, bias=True)
(v_proj): Linear(in_features=1024, out_features=1024, bias=True)
(q_proj): Linear(in_features=1024, out_features=1024, bias=True)
(out_proj): Linear(in_features=1024, out_features=1024, bias=True)
)
(self_attn_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
(activation_fn): GELUActivation()
(fc1): Linear(in_features=1024, out_features=4096, bias=True)
(fc2): Linear(in_features=4096, out_features=1024, bias=True)
(final_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
)
(7): BartEncoderLayer(
(self_attn): BartAttention(
(k_proj): Linear(in_features=1024, out_features=1024, bias=True)
(v_proj): Linear(in_features=1024, out_features=1024, bias=True)
(q_proj): Linear(in_features=1024, out_features=1024, bias=True)
(out_proj): Linear(in_features=1024, out_features=1024, bias=True)
)
(self_attn_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
(activation_fn): GELUActivation()
(fc1): Linear(in_features=1024, out_features=4096, bias=True)
(fc2): Linear(in_features=4096, out_features=1024, bias=True)
(final_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
)
(8): BartEncoderLayer(
(self_attn): BartAttention(
(k_proj): Linear(in_features=1024, out_features=1024, bias=True)
(v_proj): Linear(in_features=1024, out_features=1024, bias=True)
(q_proj): Linear(in_features=1024, out_features=1024, bias=True)
(out_proj): Linear(in_features=1024, out_features=1024, bias=True)
)
(self_attn_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
(activation_fn): GELUActivation()
(fc1): Linear(in_features=1024, out_features=4096, bias=True)
(fc2): Linear(in_features=4096, out_features=1024, bias=True)
(final_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
)
(9): BartEncoderLayer(
(self_attn): BartAttention(
(k_proj): Linear(in_features=1024, out_features=1024, bias=True)
(v_proj): Linear(in_features=1024, out_features=1024, bias=True)
(q_proj): Linear(in_features=1024, out_features=1024, bias=True)
(out_proj): Linear(in_features=1024, out_features=1024, bias=True)
)
(self_attn_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
(activation_fn): GELUActivation()
(fc1): Linear(in_features=1024, out_features=4096, bias=True)
(fc2): Linear(in_features=4096, out_features=1024, bias=True)
(final_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
)
(10): BartEncoderLayer(
(self_attn): BartAttention(
(k_proj): Linear(in_features=1024, out_features=1024, bias=True)
(v_proj): Linear(in_features=1024, out_features=1024, bias=True)
(q_proj): Linear(in_features=1024, out_features=1024, bias=True)
(out_proj): Linear(in_features=1024, out_features=1024, bias=True)
)
(self_attn_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
(activation_fn): GELUActivation()
(fc1): Linear(in_features=1024, out_features=4096, bias=True)
(fc2): Linear(in_features=4096, out_features=1024, bias=True)
(final_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
)
(11): BartEncoderLayer(
(self_attn): BartAttention(
(k_proj): Linear(in_features=1024, out_features=1024, bias=True)
(v_proj): Linear(in_features=1024, out_features=1024, bias=True)
(q_proj): Linear(in_features=1024, out_features=1024, bias=True)
(out_proj): Linear(in_features=1024, out_features=1024, bias=True)
)
(self_attn_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
(activation_fn): GELUActivation()
(fc1): Linear(in_features=1024, out_features=4096, bias=True)
(fc2): Linear(in_features=4096, out_features=1024, bias=True)
(final_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
)
)
(layernorm_embedding): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
)
(decoder): BartDecoder(
(embed_tokens): Embedding(50264, 1024, padding_idx=1)
(embed_positions): BartLearnedPositionalEmbedding(1026, 1024)
(layers): ModuleList(
(0): BartDecoderLayer(
(self_attn): BartAttention(
(k_proj): Linear(in_features=1024, out_features=1024, bias=True)
(v_proj): Linear(in_features=1024, out_features=1024, bias=True)
(q_proj): Linear(in_features=1024, out_features=1024, bias=True)
(out_proj): Linear(in_features=1024, out_features=1024, bias=True)
)
(activation_fn): GELUActivation()
(self_attn_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
(encoder_attn): BartAttention(
(k_proj): Linear(in_features=1024, out_features=1024, bias=True)
(v_proj): Linear(in_features=1024, out_features=1024, bias=True)
(q_proj): Linear(in_features=1024, out_features=1024, bias=True)
(out_proj): Linear(in_features=1024, out_features=1024, bias=True)
)
(encoder_attn_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
(fc1): Linear(in_features=1024, out_features=4096, bias=True)
(fc2): Linear(in_features=4096, out_features=1024, bias=True)
(final_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
)
(1): BartDecoderLayer(
(self_attn): BartAttention(
(k_proj): Linear(in_features=1024, out_features=1024, bias=True)
(v_proj): Linear(in_features=1024, out_features=1024, bias=True)
(q_proj): Linear(in_features=1024, out_features=1024, bias=True)
(out_proj): Linear(in_features=1024, out_features=1024, bias=True)
)
(activation_fn): GELUActivation()
(self_attn_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
(encoder_attn): BartAttention(
(k_proj): Linear(in_features=1024, out_features=1024, bias=True)
(v_proj): Linear(in_features=1024, out_features=1024, bias=True)
(q_proj): Linear(in_features=1024, out_features=1024, bias=True)
(out_proj): Linear(in_features=1024, out_features=1024, bias=True)
)
(encoder_attn_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
(fc1): Linear(in_features=1024, out_features=4096, bias=True)
(fc2): Linear(in_features=4096, out_features=1024, bias=True)
(final_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
)
(2): BartDecoderLayer(
(self_attn): BartAttention(
(k_proj): Linear(in_features=1024, out_features=1024, bias=True)
(v_proj): Linear(in_features=1024, out_features=1024, bias=True)
(q_proj): Linear(in_features=1024, out_features=1024, bias=True)
(out_proj): Linear(in_features=1024, out_features=1024, bias=True)
)
(activation_fn): GELUActivation()
(self_attn_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
(encoder_attn): BartAttention(
(k_proj): Linear(in_features=1024, out_features=1024, bias=True)
(v_proj): Linear(in_features=1024, out_features=1024, bias=True)
(q_proj): Linear(in_features=1024, out_features=1024, bias=True)
(out_proj): Linear(in_features=1024, out_features=1024, bias=True)
)
(encoder_attn_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
(fc1): Linear(in_features=1024, out_features=4096, bias=True)
(fc2): Linear(in_features=4096, out_features=1024, bias=True)
(final_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
)
(3): BartDecoderLayer(
(self_attn): BartAttention(
(k_proj): Linear(in_features=1024, out_features=1024, bias=True)
(v_proj): Linear(in_features=1024, out_features=1024, bias=True)
(q_proj): Linear(in_features=1024, out_features=1024, bias=True)
(out_proj): Linear(in_features=1024, out_features=1024, bias=True)
)
(activation_fn): GELUActivation()
(self_attn_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
(encoder_attn): BartAttention(
(k_proj): Linear(in_features=1024, out_features=1024, bias=True)
(v_proj): Linear(in_features=1024, out_features=1024, bias=True)
(q_proj): Linear(in_features=1024, out_features=1024, bias=True)
(out_proj): Linear(in_features=1024, out_features=1024, bias=True)
)
(encoder_attn_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
(fc1): Linear(in_features=1024, out_features=4096, bias=True)
(fc2): Linear(in_features=4096, out_features=1024, bias=True)
(final_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
)
(4): BartDecoderLayer(
(self_attn): BartAttention(
(k_proj): Linear(in_features=1024, out_features=1024, bias=True)
(v_proj): Linear(in_features=1024, out_features=1024, bias=True)
(q_proj): Linear(in_features=1024, out_features=1024, bias=True)
(out_proj): Linear(in_features=1024, out_features=1024, bias=True)
)
(activation_fn): GELUActivation()
(self_attn_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
(encoder_attn): BartAttention(
(k_proj): Linear(in_features=1024, out_features=1024, bias=True)
(v_proj): Linear(in_features=1024, out_features=1024, bias=True)
(q_proj): Linear(in_features=1024, out_features=1024, bias=True)
(out_proj): Linear(in_features=1024, out_features=1024, bias=True)
)
(encoder_attn_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
(fc1): Linear(in_features=1024, out_features=4096, bias=True)
(fc2): Linear(in_features=4096, out_features=1024, bias=True)
(final_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
)
(5): BartDecoderLayer(
(self_attn): BartAttention(
(k_proj): Linear(in_features=1024, out_features=1024, bias=True)
(v_proj): Linear(in_features=1024, out_features=1024, bias=True)
(q_proj): Linear(in_features=1024, out_features=1024, bias=True)
(out_proj): Linear(in_features=1024, out_features=1024, bias=True)
)
(activation_fn): GELUActivation()
(self_attn_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
(encoder_attn): BartAttention(
(k_proj): Linear(in_features=1024, out_features=1024, bias=True)
(v_proj): Linear(in_features=1024, out_features=1024, bias=True)
(q_proj): Linear(in_features=1024, out_features=1024, bias=True)
(out_proj): Linear(in_features=1024, out_features=1024, bias=True)
)
(encoder_attn_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
(fc1): Linear(in_features=1024, out_features=4096, bias=True)
(fc2): Linear(in_features=4096, out_features=1024, bias=True)
(final_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
)
(6): BartDecoderLayer(
(self_attn): BartAttention(
(k_proj): Linear(in_features=1024, out_features=1024, bias=True)
(v_proj): Linear(in_features=1024, out_features=1024, bias=True)
(q_proj): Linear(in_features=1024, out_features=1024, bias=True)
(out_proj): Linear(in_features=1024, out_features=1024, bias=True)
)
(activation_fn): GELUActivation()
(self_attn_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
(encoder_attn): BartAttention(
(k_proj): Linear(in_features=1024, out_features=1024, bias=True)
(v_proj): Linear(in_features=1024, out_features=1024, bias=True)
(q_proj): Linear(in_features=1024, out_features=1024, bias=True)
(out_proj): Linear(in_features=1024, out_features=1024, bias=True)
)
(encoder_attn_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
(fc1): Linear(in_features=1024, out_features=4096, bias=True)
(fc2): Linear(in_features=4096, out_features=1024, bias=True)
(final_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
)
(7): BartDecoderLayer(
(self_attn): BartAttention(
(k_proj): Linear(in_features=1024, out_features=1024, bias=True)
(v_proj): Linear(in_features=1024, out_features=1024, bias=True)
(q_proj): Linear(in_features=1024, out_features=1024, bias=True)
(out_proj): Linear(in_features=1024, out_features=1024, bias=True)
)
(activation_fn): GELUActivation()
(self_attn_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
(encoder_attn): BartAttention(
(k_proj): Linear(in_features=1024, out_features=1024, bias=True)
(v_proj): Linear(in_features=1024, out_features=1024, bias=True)
(q_proj): Linear(in_features=1024, out_features=1024, bias=True)
(out_proj): Linear(in_features=1024, out_features=1024, bias=True)
)
(encoder_attn_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
(fc1): Linear(in_features=1024, out_features=4096, bias=True)
(fc2): Linear(in_features=4096, out_features=1024, bias=True)
(final_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
)
(8): BartDecoderLayer(
(self_attn): BartAttention(
(k_proj): Linear(in_features=1024, out_features=1024, bias=True)
(v_proj): Linear(in_features=1024, out_features=1024, bias=True)
(q_proj): Linear(in_features=1024, out_features=1024, bias=True)
(out_proj): Linear(in_features=1024, out_features=1024, bias=True)
)
(activation_fn): GELUActivation()
(self_attn_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
(encoder_attn): BartAttention(
(k_proj): Linear(in_features=1024, out_features=1024, bias=True)
(v_proj): Linear(in_features=1024, out_features=1024, bias=True)
(q_proj): Linear(in_features=1024, out_features=1024, bias=True)
(out_proj): Linear(in_features=1024, out_features=1024, bias=True)
)
(encoder_attn_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
(fc1): Linear(in_features=1024, out_features=4096, bias=True)
(fc2): Linear(in_features=4096, out_features=1024, bias=True)
(final_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
)
(9): BartDecoderLayer(
(self_attn): BartAttention(
(k_proj): Linear(in_features=1024, out_features=1024, bias=True)
(v_proj): Linear(in_features=1024, out_features=1024, bias=True)
(q_proj): Linear(in_features=1024, out_features=1024, bias=True)
(out_proj): Linear(in_features=1024, out_features=1024, bias=True)
)
(activation_fn): GELUActivation()
(self_attn_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
(encoder_attn): BartAttention(
(k_proj): Linear(in_features=1024, out_features=1024, bias=True)
(v_proj): Linear(in_features=1024, out_features=1024, bias=True)
(q_proj): Linear(in_features=1024, out_features=1024, bias=True)
(out_proj): Linear(in_features=1024, out_features=1024, bias=True)
)
(encoder_attn_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
(fc1): Linear(in_features=1024, out_features=4096, bias=True)
(fc2): Linear(in_features=4096, out_features=1024, bias=True)
(final_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
)
(10): BartDecoderLayer(
(self_attn): BartAttention(
(k_proj): Linear(in_features=1024, out_features=1024, bias=True)
(v_proj): Linear(in_features=1024, out_features=1024, bias=True)
(q_proj): Linear(in_features=1024, out_features=1024, bias=True)
(out_proj): Linear(in_features=1024, out_features=1024, bias=True)
)
(activation_fn): GELUActivation()
(self_attn_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
(encoder_attn): BartAttention(
(k_proj): Linear(in_features=1024, out_features=1024, bias=True)
(v_proj): Linear(in_features=1024, out_features=1024, bias=True)
(q_proj): Linear(in_features=1024, out_features=1024, bias=True)
(out_proj): Linear(in_features=1024, out_features=1024, bias=True)
)
(encoder_attn_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
(fc1): Linear(in_features=1024, out_features=4096, bias=True)
(fc2): Linear(in_features=4096, out_features=1024, bias=True)
(final_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
)
(11): BartDecoderLayer(
(self_attn): BartAttention(
(k_proj): Linear(in_features=1024, out_features=1024, bias=True)
(v_proj): Linear(in_features=1024, out_features=1024, bias=True)
(q_proj): Linear(in_features=1024, out_features=1024, bias=True)
(out_proj): Linear(in_features=1024, out_features=1024, bias=True)
)
(activation_fn): GELUActivation()
(self_attn_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
(encoder_attn): BartAttention(
(k_proj): Linear(in_features=1024, out_features=1024, bias=True)
(v_proj): Linear(in_features=1024, out_features=1024, bias=True)
(q_proj): Linear(in_features=1024, out_features=1024, bias=True)
(out_proj): Linear(in_features=1024, out_features=1024, bias=True)
)
(encoder_attn_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
(fc1): Linear(in_features=1024, out_features=4096, bias=True)
(fc2): Linear(in_features=4096, out_features=1024, bias=True)
(final_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
)
)
(layernorm_embedding): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
)
)
(lm_head): Linear(in_features=1024, out_features=50264, bias=False)
)
)

Traceback (most recent call last):
File "main.py", line 548, in
main(args)
File "main.py", line 523, in main
run(0, args)
File "main.py", line 444, in run
loss.backward()
File "/usr/local/lib/python3.6/dist-packages/torch/tensor.py", line 221, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph)
File "/usr/local/lib/python3.6/dist-packages/torch/autograd/init.py", line 132, in backward
allow_unreachable=True) # allow_unreachable flag
RuntimeError: CUDA out of memory. Tried to allocate 314.00 MiB (GPU 0; 10.92 GiB total capacity; 9.50 GiB already allocated; 69.31 MiB free; 10.02 GiB reserved in total by PyTorch)
Exception ignored in: <bound method Recorder.del of <utils.Recorder object at 0x7f917d3a0e48>>
Traceback (most recent call last):
File "/tf/tiennv80/NewsSum/brio/utils.py", line 54, in del
File "/usr/local/lib/python3.6/dist-packages/torch/utils/tensorboard/writer.py", line 1034, in close
File "/usr/local/lib/python3.6/dist-packages/torch/utils/tensorboard/writer.py", line 139, in close
File "/usr/local/lib/python3.6/dist-packages/tensorboard/summary/writer/event_file_writer.py", line 130, in close
File "/usr/local/lib/python3.6/dist-packages/tensorboard/summary/writer/event_file_writer.py", line 185, in close
File "/usr/local/lib/python3.6/dist-packages/tensorboard/summary/writer/event_file_writer.py", line 213, in stop
File "/usr/lib/python3.6/queue.py", line 145, in put
File "/usr/lib/python3.6/threading.py", line 347, in notify
TypeError: 'NoneType' object is not callable
`
And I run it on linux computer with 1 GPU card GTX 1080 Ti 11Gb.
Could you give me some tips for this error?

data preprocessing

Hi Yixin, thank you for this fantastic work.
I am reproducing the BRIO model and would like to realize the difference between the data and data.tokenized files, since there seems to be no code to discriminate them.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.