Giter VIP home page Giter VIP logo

pnlp-mixer's Introduction

pNLP-Mixer - Unofficial PyTorch Implementation

pNLP-Mixer: an Efficient all-MLP Architecture for Language

Implementation of pNLP-Mixer in PyTorch and PyTorch Lightning.

pNLP-Mixer is the first successful application of the MLP-Mixer architecture in NLP. With a novel embedding-free projection layer, pNLP-Mixer shows performance comparable to transformer-based models (e.g. mBERT, RoBERTa) with significantly smaller parameter count and no expensive pretraining procedures.

Requirements

  • Python >= 3.6.10
  • PyTorch >= 1.8.0
  • PyTorch Lightning >= 1.4.3
  • All other requirements are listed in the requirements.txt file.

Configurations

Please check configuration examples and also comments in the cfg directory.

Commands

Caching Vocab Hashes

python projection.py -v VOCAB_FILE -c CFG_PATH -g NGRAM_SIZE -o OUTPUT_FILE
  • VOCAB_FILE: path to the vocab file that contains
  • CFG_PATH: path to the configurations file
  • NGRAM_SIZE: size of n-grams used during hashing
  • OUTPUT_FILE: path where the resulting .npy file will be stored

Training / Testing

python run.py -c CFG_PATH -n MODEL_NAME -m MODE -p CKPT_PATH
  • CFG_PATH: path to the configurations file
  • MODEL_NAME: model name to be used for pytorch lightning logging
  • MODE: train or test (default: train)
  • CKPT_PATH: (optional) checkpoint path to resume training from or to use for testing

Results

The checkpoints used for evaluation are available here.

MTOP

Model Size Reported Ours
pNLP-Mixer X-Small 76.9% 79.3%
pNLP-Mixer Base 80.8% 79.4%
pNLP-Mixer X-Large 82.3% 82.1%

MultiATIS

Model Size Reported Ours
pNLP-Mixer X-Small 90.0% 91.3%
pNLP-Mixer Base 92.1% 92.8%
pNLP-Mixer X-Large 91.3% 92.9%

* Note that the paper reports the performance on the MultiATIS dataset using a 8-bit quantized model, whereas our performance was measured using a 32-bit float model.

IMDB

Model Size Reported Ours
pNLP-Mixer X-Small 81.9% 81.5%
pNLP-Mixer Base 78.6% 82.2%
pNLP-Mixer X-Large 82.9% 82.9%

Paper

@article{fusco2022pnlp,
  title={pNLP-Mixer: an Efficient all-MLP Architecture for Language},
  author={Fusco, Francesco and Pascual, Damian and Staar, Peter},
  journal={arXiv preprint arXiv:2202.04350},
  year={2022}
}

Contributors

Special thanks to:

TODO

  • 8-bit quantization

pnlp-mixer's People

Contributors

tonyswoo avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

pnlp-mixer's Issues

Using a SentencePiece Vocabulary

Hi!

Do you have an example of how to use this with a Sentence Piece vocab? I see that you have the sentencepiece_extractor.py file, but it's not clear how to move from that output to the vocab necessary for the projection.

Thanks. :)

Questions Regarding the model

  1. The XL models intent accuracy are ~1% away from mBERT in general, and in some of the language subcategories. Even though on the surface a hypothetical XXL model would be able to parity mBERT (as per comparing L and XL models), is there a possibility that it can have diminishing returns?
  2. Are there any way of demonstrating the interpretability of Transformer-based models (e.g. BERT and GPT-likes), are there similar mechanisms for the Mixer (since MLP-Mixer visualization exists)? https://jalammar.github.io/illustrated-transformer/ https://medium.com/ml-summaries/mlp-mixer-an-all-mlp-architecture-for-vision-paper-summary-e50fa915e04d
  3. On a speculative note, when can the model be scaled to parity GPT-Neo or its commercial counterparts?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.