initialbug / biset Goto Github PK

BiSET: Bi-directional Selective Encoding with Template for Abstractive Summarization (ACL 2019)

License: MIT License

Python 84.80% Shell 4.09% Perl 6.09% Smalltalk 0.31% Emacs Lisp 2.84% JavaScript 0.14% NewLisp 0.26% Ruby 0.27% Slash 0.05% SystemVerilog 0.03% Java 1.12%

biset's People

Contributors

Stargazers

Watchers

Forkers

pencoa peter-xbs thorphan itscassie udnet96 colinsongf ktl014 ammieqi yuyeon

biset's Issues

Could you provide x.samples.index.json, x_scores.json, x.template.index？

I didn't execute the “Retrieve” module, I think it took a little bit of time. You provided retrived and reranked data in "Notice". I downloaded the data from "Google Disk", which contains x.title.ext, x.template.txt, x.title.txt, test.template.rerank-x. I want to have a look at the file "x.t.plate.index", and I want to execute the second module "FastRerank", could you please provide x.samples.index.json, x_scores.json, x.template.index?

Generated outputs

Hi,
Is it possible to either provide the generated outputs on the test set or any pretrained model checkpoint that I can directly use to decode the summaries ? I only need the summaries of the test set.

Thanks!

What is the dataset for Retrieve module?

Excuse me, I am wondering what is the dataset for retrieve?
Is it same as the training set of source articles, English Gigaword?
I am really not familiar with Information Retrieve, please forgive me if this is a stupid question. Thank you!

Where does the score file in FastRerank come from?

In config.py, on line 26-28
it seems like the preprocessing step need article.txt, title.txt, template, samples.index.json and _score.json to be prepared in config.py to run the whole process
But after doing retrieve, I only got train/test/dev.sample.index other than the original article and title file
So how can I get all the other data I need such as sample.index.json and _score.json ?

How do I generate a template from raw data！

I have read this article, but I did not find any introduction on how to generate a template！could you please tell me how to generate a template?

Error while using -copy_attn function

when I was trying to train my model with -copy_attn (copy attention function)
It occurs some errors in both condition of whether using gpu or not
But I'm thinking the template setting does not conflict with this copy_attn function (also the coverage_attn works well )
so my command line looks like python3 train.py -data path/to/data -copy_attn
and the error message looks like (with gpu)

Traceback (most recent call last):
  File "train.py", line 42, in <module>
    main(opt)
  File "train.py", line 28, in main
    single_main(opt)
  File "/media/yoshow/HDD/BiSET/Bi-selective Encoding/onmt/train_single.py", line 133, in main
    opt.valid_steps)
  File "/media/yoshow/HDD/BiSET/Bi-selective Encoding/onmt/trainer.py", line 172, in train
    report_stats)
  File "/media/yoshow/HDD/BiSET/Bi-selective Encoding/onmt/trainer.py", line 296, in _gradient_accumulation
    trunc_size, self.shard_size, normalization)
  File "/media/yoshow/HDD/BiSET/Bi-selective Encoding/onmt/utils/loss.py", line 145, in sharded_compute_loss
    loss, stats = self._compute_loss(batch, **shard)
  File "/media/yoshow/HDD/BiSET/Bi-selective Encoding/onmt/modules/copy_generator.py", line 193, in _compute_loss
    batch, self.tgt_vocab, batch.dataset.src_vocabs)
  File "/media/yoshow/HDD/BiSET/Bi-selective Encoding/onmt/inputters/text_dataset.py", line 119, in collapse_copy_scores
    print('index: %s'%index)
  File "/home/yoshow/venv/pytorch11/lib/python3.5/site-packages/torch/tensor.py", line 71, in __repr__
    return torch._tensor_str._str(self)
  File "/home/yoshow/venv/pytorch11/lib/python3.5/site-packages/torch/_tensor_str.py", line 286, in _str
    tensor_str = _tensor_str(self, indent)
  File "/home/yoshow/venv/pytorch11/lib/python3.5/site-packages/torch/_tensor_str.py", line 201, in _tensor_str
    formatter = _Formatter(get_summarized_data(self) if summarize else self)
  File "/home/yoshow/venv/pytorch11/lib/python3.5/site-packages/torch/_tensor_str.py", line 83, in __init__
    value_str = '{}'.format(value)
  File "/home/yoshow/venv/pytorch11/lib/python3.5/site-packages/torch/tensor.py", line 386, in __format__
    return self.item().__format__(format_spec)
RuntimeError: CUDA error: device-side assert triggered

and the error message without using gpu is like

/home/yoshow/venv/pytorch11/lib/python3.5/site-packages/torchtext/data/field.py:359: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
  var = torch.tensor(arr, dtype=self.dtype, device=device)
/home/yoshow/venv/pytorch11/lib/python3.5/site-packages/torch/nn/functional.py:1386: UserWarning: nn.functional.sigmoid is deprecated. Use torch.sigmoid instead.
  warnings.warn("nn.functional.sigmoid is deprecated. Use torch.sigmoid instead.")
/home/yoshow/venv/pytorch11/lib/python3.5/site-packages/torch/nn/functional.py:1374: UserWarning: nn.functional.tanh is deprecated. Use torch.tanh instead.
  warnings.warn("nn.functional.tanh is deprecated. Use torch.tanh instead.")
Traceback (most recent call last):
  File "train.py", line 42, in <module>
    main(opt)
  File "train.py", line 28, in main
    single_main(opt)
  File "/media/yoshow/HDD/BiSET/Bi-selective Encoding/onmt/train_single.py", line 133, in main
    opt.valid_steps)
  File "/media/yoshow/HDD/BiSET/Bi-selective Encoding/onmt/trainer.py", line 172, in train
    report_stats)
  File "/media/yoshow/HDD/BiSET/Bi-selective Encoding/onmt/trainer.py", line 296, in _gradient_accumulation
    trunc_size, self.shard_size, normalization)
  File "/media/yoshow/HDD/BiSET/Bi-selective Encoding/onmt/utils/loss.py", line 145, in sharded_compute_loss
    loss, stats = self._compute_loss(batch, **shard)
  File "/media/yoshow/HDD/BiSET/Bi-selective Encoding/onmt/modules/copy_generator.py", line 189, in _compute_loss
    loss = self.criterion(scores, align, target)
  File "/media/yoshow/HDD/BiSET/Bi-selective Encoding/onmt/modules/copy_generator.py", line 125, in __call__
    out = scores.gather(1, align.view(-1, 1) + self.offset).view(-1)
RuntimeError: Invalid index in gather at /pytorch/aten/src/TH/generic/THTensorEvenMoreMath.cpp:459

Error in translation using trained model

I trained the model successfully and ran the command $ python3 translate.py -model model_step_85000.pt -src path_to_data/test.article.txt -template path_to_data/test.template.rerank-5 for translation
but it occurred the following error:

Traceback (most recent call last):
  File "translate.py", line 36, in <module>
    main(opt)
  File "translate.py", line 25, in main
    attn_debug=opt.attn_debug)
  File "/home/yoshow/Summarization/BiSET/Bi-selective Encoding/onmt/translate/translator.py", line 198, in translate
    use_filter_pred=self.use_filter_pred)
  File "/home/yoshow/Summarization/BiSET/Bi-selective Encoding/onmt/inputters/inputter.py", line 248, in build_dataset
    use_filter_pred=use_filter_pred)
  File "/home/yoshow/Summarization/BiSET/Bi-selective Encoding/onmt/inputters/text_dataset.py", line 66, in __init__
    ex, examples_iter = self._peek(examples_iter)
  File "/home/yoshow/Summarization/BiSET/Bi-selective Encoding/onmt/inputters/dataset_base.py", line 107, in _peek
    first = next(seq)
  File "/home/yoshow/Summarization/BiSET/Bi-selective Encoding/onmt/inputters/text_dataset.py", line 303, in _dynamic_dict
    template = example["template"]
KeyError: 'template'

not sure where does this error come from

May I have your config file?

Hello, I am not sure that I set all parameters right. Can you provide some parameters? Thanks.

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.