Giter VIP home page Giter VIP logo

bahasa-nlp-tensorflow's Introduction

logo

MIT License


Bahasa-NLP-Tensorflow, Gathers Tensorflow deep learning models for Bahasa Malaysia NLP problems, code simplify inside Jupyter Notebooks 100% including dataset.

Table of contents

  1. word2vec Malaya

Trained on Tatoeba dataset.

  1. Fast-text Ngrams, test accuracy 88%

Trained on Bahasa subjectivity dataset.

  1. RNN LSTM + Bahdanau Attention, test accuracy 84%
  2. RNN LSTM + Luong Attention, test accuracy 82%
  3. Transfer-learning Multilanguage BERT, test accuracy 94.88%

70+ more models can get from here.

Trained on Bahasa fakenews dataset.

  1. Dilated CNN, test accuracy 74%
  2. Wavenet, test accuracy 68%
  3. BERT Multilanguage, test accuracy 85%
  4. BERT-Bahasa Base, test accuracy 88%

Trained on Bahasa dependency parsing dataset. 80% to train, 20% to test.

Accuracy based on arc, types and root accuracies after 10 epochs only.

  1. Bidirectional RNN + CRF + Biaffine, arc accuracy 60.64%, types accuracy 58.68%, root accuracy 89.03%
  2. Bidirectional RNN + Bahdanau + CRF + Biaffine, arc accuracy 60.51%, types accuracy 59.01%, root accuracy 88.99%
  3. Bidirectional RNN + Luong + CRF + Biaffine, arc accuracy 60.60%, types accuracy 59.06%, root accuracy 89.76%
  4. BERT Base + CRF + Biaffine, arc accuracy 58.55%, types accuracy 58.12%, root accuracy 88.87%
  5. Bidirectional RNN + Biaffine Attention + Cross Entropy, arc accuracy 69.53%, types accuracy 65.38%, root accuracy 90.71%
  6. BERT Base + Biaffine Attention + Cross Entropy, arc accuracy 77.03%, types accuracy 66.73%, root accuracy 88.38%
  7. XLNET Base + Biaffine Attention + Cross Entropy, arc accuracy 93.50%, types accuracy 92.48%, root accuracy 94.46%

Trained on 100k english-malay dataset.

  1. Attention is All you need, train accuracy 19.09% test accuracy 20.38%
  2. BiRNN Seq2Seq Luong Attention, Beam decoder, train accuracy 45.2% test accuracy 37.26%
  3. Convolution Encoder Decoder, train accuracy 35.89% test accuracy 30.65%
  4. Dilated Convolution Encoder Decoder, train accuracy 82.3% test accuracy 56.72%
  5. Dilated Convolution Encoder Decoder Self-Attention, train accuracy 60.76% test accuracy 36.59%

Trained on Bahasa entity dataset.

  1. Bidirectional LSTM + CRF, test accuracy 95.10%
  2. Bidirectional LSTM + CRF + Bahdanau, test accuracy 94.34%
  3. Bidirectional LSTM + CRF + Luong, test accuracy 94.84%
  4. BERT Multilanguage, test accuracy 96.43%
  5. BERT-Bahasa Base, test accuracy 98.11%
  6. BERT-Bahasa Small, test accuracy 98.47%
  7. XLNET-Bahasa Base, test accuracy 98.008%

Trained on Bahasa entity dataset.

  1. Bidirectional LSTM + CRF
  2. Bidirectional LSTM + CRF + Bahdanau
  3. Bidirectional LSTM + CRF + Luong
  4. Bert-Bahasa-Base + CRF, test accuracy 95.17%
  5. XLNET-Bahasa-Base + CRF, test accuracy 95.58%

Trained on Malaysia news dataset.

Accuracy based on ROUGE-2 after 20 epochs only.

  1. Dilated Seq2Seq, test accuracy 23.926%
  2. Pointer Generator + Bahdanau Attention, test accuracy 15.839%
  3. Pointer Generator + Luong Attention, test accuracy 26.23%
  4. Dilated Seq2Seq + Pointer Generator, test accuracy 20.382%
  5. BERT Multilanguage + Dilated CNN Seq2seq + Pointer Generator, test accuracy 23.7134%

Trained on Malaysia news dataset.

  1. Skip-thought
  2. Residual Network + Bahdanau Attention

Trained on OCR Jawi to Malay

  1. CNN + LSTM RNN, test accuracy 63.86%
  2. Im2Latex, test accuracy 89.38%

Trained on Bahasa QA dataset.

  1. End-to-End + GRU, test accuracy 89.17%
  2. Dynamic Memory + GRU, test accuracy 98.86%

Trained on Translated Duplicated Quora question dataset.

  1. LSTM Bahdanau + Contrastive loss, test accuracy 79%
  2. Dilated CNN + Contrastive loss, test accuracy 77%
  3. Self-Attention + Contrastive loss, test accuracy 77%
  4. BERT + Cross entropy, test accuracy 83%

Trained on Kamus speech dataset.

  1. BiRNN + LSTM + CTC Greedy, test accuracy 72.03%
  2. Wavenet, test accuracy 10.21%
  3. Deep speech 2, test accuracy 56.51%
  4. Dilated-CNN, test accuracy 59.31%
  5. Im2Latex, test accuracy 58.59%
  1. Tacotron
  2. Seq2Seq + Bahdanau Attention
  3. Deep CNN + Monothonic Attention + Dilated CNN vocoder

Trained on stemming dataset.

  1. Seq2seq + Beam decoder
  2. Seq2seq + Bahdanau Attention + Beam decoder
  3. Seq2seq + Luong Attention + Beam decoder

Trained on Malaysia news dataset.

  1. TAT-LSTM, test accuracy 32.89%
  2. TAV-LSTM, test accuracy 40.69%
  3. MTA-LSTM, test accuracy 32.96%
  1. Lda2Vec
  1. word2vec
  2. ELMO
  3. Fast-text

bahasa-nlp-tensorflow's People

Contributors

huseinzol05 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

bahasa-nlp-tensorflow's Issues

generate.py

after I install malaya, this error come when I import malaya

AttributeError: type object 'Path' has no attribute 'home'

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.