Giter VIP home page Giter VIP logo

bert-han's Introduction

BERT Hierarchical Attention Network

Hierarchical-Attention-Network for Document Classification implementation in PyTorch with a replacement of the traditional BiLSTM with BERT model.

This repository is an implementation of the article Hierarchical Attention Networks for Document Classification (Yang et al.) such that one can choose if to use a traditional BiLSTM for creating sentence embeddings for each sentence or to use BERT for this task (configurable). If one chooses to use BERT in order to create sentence embedding for each sentence, then the rest of the network architecture is the same like in the original paper, i.e. feeding the sentence embeddings into BiLSTM encoder with attention to get a fixed length document vector, that in turn, is fed into a Multi Layer Perceptron with a Softmax activation aligned with the number of different classes of the chosen data set.

Original architecture han

Setup Instructions

Install pipenv with the following command:

$ pip install pipenv

Open pipenv environment in a new shell:

$ pipenv shell

Add the project to PYTHONPATH:

$ export PYTHONPATH=$PYTHONPATH:/path/to/han/src

Install dependencies:

$ pipenv sync

Usage

Step 1: Download data sets

Download the document classification data sets from my Google Drive folder. Unpack it somewhere to create the following directory structure:

/path/to/data
├── ag_news_csv
│   ├── classes.txt
│   ├── readme.txt
│   ├── test.csv
│   ├── train.csv
├── yahoo_answers_csv
│   ├── classes.txt
│   ├── readme.txt
│   ├── test.csv
│   ├── train.csv
...

Step 2: Run the experiments

Every experiment has its own config file in experiments. The pipeline of working with any model version or dataset is:

python run.py preprocess experiment_config_file   # Step 3a: preprocess the data
python run.py train experiment_config_file        # Step 3b: train a model
python run.py infer experiment_config_file        # Step 3c: evaluate the results

Use the following experiment config files to reproduce results:

  • AG News, BiLSTM (GloVE embeddings) version: experiments/han-yahoo-glove-run.jsonnet.jsonnet
  • AG News, BERT (base) version: experiments/han-yahoo-bert-run.jsonnet.jsonnet
  • Yahoo Answers, BiLSTM (GloVE embeddings) version: experiments/han-yahoo-glove-run.jsonnet

One may add new configuration files from other data sets or even play with the hyper-parameters of the existing configuration.

The infer step will output the classification report against the test set of the desired data set. For example, on the AG News data set, with BiLSTM (GloVE embeddings) sentence encoder:

               precision    recall  f1-score   support

       World       0.94      0.93      0.93      1900
      Sports       0.98      0.99      0.98      1900
    Business       0.89      0.91      0.90      1899
    Sci/Tech       0.92      0.90      0.91      1900

    accuracy                           0.93      7599
   macro avg       0.93      0.93      0.93      7599
weighted avg       0.93      0.93      0.93      7599

Step 3: Visualize Predictions

One can visualize the sentence/word attention weights per each item in the test set, after running the infer command, using the notebook notebooks/Prediction Visualizer.ipynb.

Please note that one may need to change the value of PREDICTIONS_PATH when using this notebook.

For example, for item in index 200, we will notice that the 2nd sentence (out of 2) got the most attention and same goes for the phrases: broadband users and internet users that had the highest weights when determining the prediction of class Sci/Tech:

attention

References

[1] Zichao Yang, Diyi Yang, Chris Dyer, Xiaodong He, Alex Smola, Eduard Hovy, Hierarchical Attention Networks for Document Classification

@inproceedings{yang-etal-2016-hierarchical,
    title = "Hierarchical Attention Networks for Document Classification",
    author = "Yang, Zichao  and
      Yang, Diyi  and
      Dyer, Chris  and
      He, Xiaodong  and
      Smola, Alex  and
      Hovy, Eduard",
    booktitle = "Proceedings of the 2016 Conference of the North {A}merican Chapter of the Association for Computational Linguistics: Human Language Technologies",
    month = jun,
    year = "2016",
    address = "San Diego, California",
    publisher = "Association for Computational Linguistics",
    url = "https://www.aclweb.org/anthology/N16-1174",
    doi = "10.18653/v1/N16-1174",
    pages = "1480--1489",
}

[2] Jacob Devlin, Ming-Wei Chang, Kenton Lee, Kristina Toutanova, {BERT}: Pre-training of Deep Bidirectional Transformers for Language Understanding

@inproceedings{devlin-etal-2019-bert,
    title = "{BERT}: Pre-training of Deep Bidirectional Transformers for Language Understanding",
    author = "Devlin, Jacob  and
      Chang, Ming-Wei  and
      Lee, Kenton  and
      Toutanova, Kristina",
    booktitle = "Proceedings of the 2019 Conference of the North {A}merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)",
    month = jun,
    year = "2019",
    address = "Minneapolis, Minnesota",
    publisher = "Association for Computational Linguistics",
    url = "https://www.aclweb.org/anthology/N19-1423",
    doi = "10.18653/v1/N19-1423",
    pages = "4171--4186",
    abstract = "We introduce a new language representation model called BERT, which stands for Bidirectional Encoder Representations from Transformers. Unlike recent language representation models (Peters et al., 2018a; Radford et al., 2018), BERT is designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers. As a result, the pre-trained BERT model can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks, such as question answering and language inference, without substantial task-specific architecture modifications. BERT is conceptually simple and empirically powerful. It obtains new state-of-the-art results on eleven natural language processing tasks, including pushing the GLUE score to 80.5 (7.7 point absolute improvement), MultiNLI accuracy to 86.7{\%} (4.6{\%} absolute improvement), SQuAD v1.1 question answering Test F1 to 93.2 (1.5 point absolute improvement) and SQuAD v2.0 Test F1 to 83.1 (5.1 point absolute improvement).",
}

bert-han's People

Contributors

hazoom avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.