Giter VIP home page Giter VIP logo

bigram-language-model-from-scratch's Introduction

N-gram in Natural Language Processing

Bigram-Language-Model-from-Scratch

In this python program a Bigram Language Model is build from scratch and trained for the training corpus with no-smoothing and add-one smoothing. A detailed working explanation of code is documented in the program.

Training Corpus

There are 10059 sentences , 17139 of unique words and 218619 words in the corpus.

Test Sentences

We check our model for two sentences::

  1. thus , because no man can follow another into these halls
  2. upon this the captain started , and eagerly desired to know more

Which are entered as list in the main program.

Results

To test model's performace for the the above two sentences bigram counts and bigram probabilities along with the probability of test sentence under the trained model is printed to the text files results_no_smoothing (Results without smoothing) and resutls_add_one_smoothing (Results with add one smoothing).

How to run the ngrams.py file

Enter 0 for no smoothing and 1 for smoothing.

Type the following command to take input and output text file:

no-smooting::

python -u ngrams.py 0 train_corpus.txt > results_no_smoothing.txt

add-one smooting::

python -u ngrams.py 1 train_corpus.txt > resutls_add_one_smoothing.txt

The structure of the command is ::

python -u <python-file-name.py> <smoothing(0 or 1)> <input-txt-data.txt> > <output-txt-file.txt>

Note: There is bigram_model.ipynb file also which can be directly opened on Jupyter Notebook, make sure training corpus is in the same folder. It also contains the detailed explaination of the program.

bigram-language-model-from-scratch's People

Contributors

prigarg avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

Forkers

sagyn-s

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.