Giter VIP home page Giter VIP logo

nlp-parsing-with-pcfgs-cky-algorithm-'s Introduction

Parsing with Probabilistic Context Free Grammars (CKY algorithm)

Reading the grammar: verify_grammar()

In grammar.py, read_rules takes each line in the grammar file and populates the instance variables rhs_to_rules, lhs_to_rules, and start symbol. Using parse_rule, it separates the lhs, rhs, and probability in each rule in the PCFG and then populates rhs_to_rules[lhs] and lhs_to_rules[rhs] which are both lists.

First, I wrote the method for verify_grammar in grammary.py to determine if any given grammar is in Chomsky Normal Form (CNF). That is, any nonterminal points to two nonterminals or a single terminal.

NT --> NT NT NT --> T

I used a smaller test grammar in sample.pcfg to ensure this method worked correctly.

Command line command to run: python3 grammar.py sample.pcfg

Membership Checking with CKY: is_in_language(self, tokens)

I used the CKY algorithm to determine if a given sentence is in the grammar. The CYK algorithm uses dynamic program to iterate over a table (a dictionary in this code). It first initialized the diagonal of the table with the nonterminals that point to the terminals in the grammar. Some terminals have multiple non-terminals associated with them. For this reason, each cell can have multiple non-terminals.

We then go through the rest of the table by looking at combinations of cells that could produce another non-terminal. If in the end, we have the start symbol in the upper right corner of the table (in position (0, len(sentence))), then the sentence is in the grammar.

Parsing with backpointers: parse_with_backpointers(self, tokens)

Here, I enhanced the previous CYK algorithm to keep track of backpointers and probabilities. This allows us to traceback our parsetree and find which once is the most probable. This is necessary because a given sentence in the grammar may have multiple parse trees.

Retrieving a parse tree: get_tree(chart, i,j, nt)

get_tree() formats the parse tree by recursively iterating through the probability table and backpointer table produced by parse_with_backpointers.

nlp-parsing-with-pcfgs-cky-algorithm-'s People

Contributors

ayshajamjam avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.