Giter VIP home page Giter VIP logo

malaya's Introduction

logo

Pypi version Python3 version MIT License Documentation Build status


Malaya is a Natural-Language-Toolkit library for bahasa Malaysia, powered by Deep Learning Tensorflow.

Documentation

Proper documentation is available at https://malaya.readthedocs.io/

Installing from the PyPI

CPU version

$ pip install malaya

GPU version

$ pip install malaya-gpu

Only Python 3.6.x and above and Tensorflow 1.X are supported.

Features

  • Emotion Analysis

    From fine-tuning BERT, Attention-Recurrent model, and Self-Attention to build deep emotion analysis models.

  • Entities Recognition

    Latest state-of-art CRF deep learning and BERT models to do Naming Entity Recognition.

  • Language Detection

    using Multinomial, SGD, XGB, Fast-text N-grams deep learning to distinguish Malay, English, and Indonesian.

  • Normalizer

    using local Malaysia NLP researches to normalize any bahasa texts.

  • Num2Word

    Convert from numbers to cardinal or ordinal representation.

  • Part-of-Speech Recognition

    Latest state-of-art CRF deep learning models to do Part-of-Speech Recognition.

  • Dependency Parsing

    Latest state-of-art CRF deep learning models to do analyzes the grammatical structure of a sentence, establishing relationships between words.

  • ELMO (biLM)

    Provide pretrained bahasa wikipedia and bahasa news ELMO, with easy interface and visualization.

  • Relevancy Analysis

    From fine-tuning BERT, Dilated Convolutional Neural Network and Self-Attention to build deep relevancy analysis models.

  • Sentiment Analysis

    From fine-tuning BERT, Attention-Recurrent model, and Self-Attention to build deep sentiment analysis models.

  • Spell Correction

    Using local Malaysia NLP researches to auto-correct any bahasa words.

  • Stemmer

    Use Character LSTM Seq2Seq with attention state-of-art to do Bahasa stemming.

  • Subjectivity Analysis

    From fine-tuning BERT, Attention-Recurrent model, and Self-Attention to build deep subjectivity analysis models.

  • Similarity

    Use deep Encoder, Doc2Vec and BERT to build deep semantic similarity models.

  • Summarization

    Using BERT, XLNET, skip-thought, LDA, LSA and Doc2Vec to give precise unsupervised summarization, and TextRank as scoring algorithm.

  • Topic Modelling

    Provide Attention, LDA2Vec, LDA, NMF and LSA interface for easy topic modelling with topics visualization.

  • Toxicity Analysis

    From fine-tuning BERT, Attention-Recurrent model, Self-Attention to build deep toxicity analysis models.

  • Word2Vec

    Provide pretrained bahasa wikipedia and bahasa news Word2Vec, with easy interface and visualization.

  • Fast-text

    Provide pretrained bahasa wikipedia Fast-text, with easy interface and visualization.

  • BERT and XLNET

    Provide easy interface to load BERT and XLNET Bahasa.

References

If you use our software for research, please cite:

@misc{Malaya, Natural-Language-Toolkit library for bahasa Malaysia, powered by Deep Learning Tensorflow,
  author = {Husein, Zolkepli},
  title = {Malaya},
  year = {2018},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/huseinzol05/malaya}}
}

Acknowledgement

Thanks to Im Big, LigBlou, Mesolitica and KeyReply for sponsoring AWS Google and private cloud to train Malaya models.

logo

Contributing

Thank you for contributing this library, really helps a lot. Feel free to contact me to suggest me anything or want to contribute other kind of forms, we accept everything, not just code!

logo

License

License

malaya's People

Contributors

huseinzol05 avatar khursani8 avatar leowmjw avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.