The presumm-amicorpus-dialsum from bobycv06fpm

presumm-amicorpus-dialsum's Introduction

About

Disclaimer

The PreSumm model, presented in the EMNLP 2019 paper titled "Text Summarization with Pretrained Encoders" [original code], is not my work. Please credit the appropriate authors for that model.

Purpose of this repository

Need to use PreSumm as baseline model for comparison with a custom dataset.
Using the pre-trained model BertExtAbs, fine-tune PreSumm with the custom dataset.
Additional notes are available, including my code modifications in detail.

Requirements • How to Use • How to Cite

Requirements

Python 3.5.2, PyRouge [notes]

pip install -r requirements.txt

How to Use

First run: For the first time, you should use single-GPU, so the code can download the BERT model. Use -visible_gpus -1, after downloading, you could kill the process and rerun the code with multi-GPUs.
Download best performing model with PreSumm: CNN/DM BertExtAbs

A. Evaluate on untrained BertSumExtAbs

Modify script with directory where BertSumExtAbs weights are saved and run:

./src/load_custom_data_an_eval.sh

B. Fine-tune BertSumExtAbs For AMI DialSum Meeting Corpus

B.1. Download CoreNLP and export:

export CLASSPATH=./stanford-corenlp-full-2018-10-05/stanford-corenlp-3.9.2.jar

B.2. Prepare dataset

Download AMI DialSum Corpus [paper]
Delete <EOS> tags
Convert to .story with src/ami_dialsum_corpus_story.py
Run
```
./src/prepare_amidialsum_data.sh
```

B.3. Fine-tune model with AMI DIalSum dataset (modified settings such as train_steps, lrbert, lrdec, warmup*, ...)

./src/fine_tuning.sh

B.4. Evaluate

./src/eval.sh

Acknowledgement

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.

Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

TensorFlow

An Open Source Machine Learning Framework for Everyone

Django

The Web framework for perfectionists with deadlines.

Laravel

A PHP framework for web artisans

D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

web

Some thing interesting about web. New door for the world.

server

A server is a program made to process requests and deliver data to clients.

Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

Visualization

Some thing interesting about visualization, use data art

Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.

Microsoft

Open source projects and samples from Microsoft.

Google

Google ❤️ Open Source for everyone.

Alibaba

Alibaba Open Source for everyone

D3

Data-Driven Documents codes.

Tencent

China tencent open source team.

bobycv06fpm / presumm-amicorpus-dialsum Goto Github PK