bert2bert_summarization_liputan6's Introduction

Bert2Bert Liputan6

This is Bert2Bert EncoderDecoderModel train on Liputan6 Dataset Canonical, this model was base on this Documentation and this notebook

How to Use?

Install the package

Colab:

!pip install torch
!pip install transformers[torch]
!pip install evaluate
!pip install datasets

Cmd:

pip install torch
pip install transformers[torch]
pip install evaluate
pip install datasets

Install the Model

git clone https://github.com/zanuura/Bert2Bert_Summarization_Liputan6

Import Package

from transformers import EncoderDecoderModel, AutoTokenizer, pipeline
import datasets

Load Model and Tokenizer

model = EncoderDecoderModel.from_pretrained("Bert2Bert_Summarization_Liputan6/model/") # insert the path
tokenizer = AutoTokenizer.from_pretrained("Bert2Bert_Summarization_Liputan6/model/") # you also can change the tokenizer from bert-base-uncased

Test the Model

## this is test with Liputan6 Test Dataset

## Load rouge for validation

rouge = datasets.load_metric("rouge")

def generate_summary(batch):

  inputs = tokenizer(batch['clean_article'], padding="max_length", truncation=True, max_length=512, return_tensors="pt")
  input_ids = inputs.input_ids.to("cuda")
  attention_mask = inputs.attention_mask.to("cuda")

  outputs = model.generate(input_ids, attention_mask=attention_mask)
  outputs_str = tokenizer.batch_decode(outputs, skip_special_tokens=True)

  batch['pred'] = outputs_str

  return batch

results = test_data.map(generate_summary, batched=True, batch_size=batch_size, remove_columns=["clean_article"])

pred_str = results['pred']
label_str = results['clean_summary']

rouge_output = rouge.compute(predictions=pred_str, references=label_str, rouge_types=["rouge2"])["rouge2"].mid

print(rouge_output)