Giter VIP home page Giter VIP logo

bnflair's Introduction

BNFLAIR

A Flair based Bengali collections which provide different bengali flair embeddings and Bengali flair trained NER, POS, Text classification model.

Installation

pip install -r requirements.txt

Embeddings

Bengali Wiki Flair embeddings

Here we have trained Flair character based language model for Bengali Wiki dataset.

  • Forward LM

    • Total wikipedia artcles: 110449
    • Train epoch: 5 Epochs
    • Validation loss: 1.5366
    • Validation perplexity: 4.6490
  • Backward LM

    • Total wikipedia artcles: 110449
    • Train epoch: 5 Epochs
    • Validation loss: 1.4717
    • Validation perplexity: 4.3566

Bengali NER Model

Wikiann Model

Here we have trained Bengali NER model for wikiann Bengali NER dataset.

  • Total wikiann train data: 1000
  • Total wikiann validation data: 100
  • TOTAL wikiann test data: 100
  • Train epoch: 70 Epochs
  • Score in Test data
    • F-score (micro) 0.7751
    • F-score (macro) 0.775
    • Accuracy 0.7364
  • For details log check here

Usage

Embeddings

  • To generate flair embedding using any Bengali text
from flair.data import Sentence

sentence = Sentence('রামপ্রসাদ সেন জন্মগ্রহণ করেছিলেন গাঙ্গেয় পশ্চিমবঙ্গের এক তান্ত্রিক বৈদ্যব্রাহ্মণ পরিবারে।')

# init embeddings from your trained LM
char_lm_embeddings = FlairEmbeddings('models/embeddings/wikipedia/bnwiki_forward.pt')

# embed sentence
char_lm_embeddings.embed(sentence)
  • To fine-tune for training flair based NER, POS, Text classification model
from flair.embeddings import StackedEmbeddings

embedding_types = [
    FlairEmbeddings('models/embeddings/wikipedia/bnwiki_forward.pt'),
    FlairEmbeddings('models/embeddings/wikipedia/bnwiki_backward.pt')
]

embeddings = StackedEmbeddings(embeddings=embedding_types)

NER

  • To use NER model
from flair.data import Sentence
from flair.models import SequenceTagger

text = "কবিরঞ্জন রামপ্রসাদ সেন (১৭১৮ বা ১৭২৩ – ১৭৭৫) ছিলেন অষ্টাদশ শতাব্দীর এক বিশিষ্ট বাঙালি শাক্ত কবি ও সাধক।"
ner_model_path = "models/ner/wikiann.pt"

ner_model = SequenceTagger.load(ner_model_path)

sentence = Sentence(text)
ner_model.predict(sentence)
entities = sentence.get_spans('ner')

for entity in entities:
    print(entity)

# output: Span[0:3]: "কবিরঞ্জন রামপ্রসাদ সেন" → PER (0.5903)

bnflair's People

Contributors

sagorbrur avatar

Stargazers

 avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.