Giter VIP home page Giter VIP logo

spoddutur / syntaxnet Goto Github PK

View Code? Open in Web Editor NEW
81.0 4.0 21.0 458.94 MB

Syntaxnet Parsey McParseface wrapper for POS tagging and dependency parsing

Python 48.59% C++ 40.65% Shell 0.70% GLSL 0.01% Jupyter Notebook 6.45% Makefile 0.06% JavaScript 0.05% HTML 0.21% TypeScript 0.01% C 0.28% LLVM 0.01% CMake 0.90% Java 0.57% Objective-C 0.01% Objective-C++ 0.09% Ruby 0.01% Go 1.37% Perl 6 0.01% Perl 0.01% PureBasic 0.04%
syntaxnet syntaxnet-api tensorflow parsey-mcparseface python wrapper-api dependency-parser pos-tagging

syntaxnet's Introduction

Syntaxnet Parsey McParseface Python Wrapper for DependencyParsing

Note: This syntaxnet build contains The Great Models Move change.

1. Introduction

When Google declared that The World’s Most Accurate Parser i.e., SyntaxNet goes open-source, it grabbed widespread attention from machine-learning developers and researchers who were interested in core applications of NLU like automatic extraction of information, translation etc. Following gif shows how syntaxnet internally builds the dependency tree:

2. Troubles of the world's best parser SyntaxNet

Predominantly one will find two approaches to use SyntaxNet:

  1. Using demo.sh script provided by syntaxnet
  2. Invoke the same from python as a subprocess as shown below. This approach is obviously inefficient, non-scalable and over-kill as it internally calls other python scripts.
import subprocess
import os
os.chdir(r"../models/syntaxnet")
subprocess.call([    
"echo 'Bob brought the pizza to Alice.' | syntaxnet/demo.sh"
], shell = True)
+ I wanted a proper scalable python application where one can do `import syntaxnet` 
+ and use it as shown below:
import syntaxnet
from syntaxnet import gen_parser_ops...

+ I could manage to get this done and hence sharing my project here. Please find below as to how I got this!!

2.1 Other Pain Part - Syntaxnet is a RESEARCH MODEL:


  • After The Great Models Move, Tensorflow categorized SyntaxNet as RESEARCH MODEL.
  • As mentioned here, Tensorflow team will no more provide guaranteed support to SyntaxNet and they encouraged Individual researchers to support research models.

2.2 Salt on the wound:


Apart from having high struggles in installation and huge learning curve, no official support and lack of clear documentation led forums talking about myraid of issues on SyntaxNet without proper solutions. Some of them were as basic as:

  • A lot of trouble understanding documentation around both syntaxnet and related tools
  • How to use Parsey McParseface model in python application
  • Confusing I/O handling in SyntaxNet because of the uncommon .conll file format it uses for input and output.
  • How to use/export the output (ascii tree or conll ) in a format that is easy to parse

3. What this project does?

This endevour addresses to make the life of SyntaxNet enthusiasts easier. It primarily saves all those hours to get Google's SyntaxNet Parsey McParseface up and running in a way it should be. For this, am providing two things as part of this project:

  1. One line (~5mins) SyntaxNet 0.2 installation
  2. Syntaxnet Parsey McParseface wrapper for POS tagging and dependency parsing

3.1 One line (~5mins) SyntaxNet 0.2 installation

Iam sharing the osx syntaxnet package distribution i.e., syntaxnet-0.2-cp27-cp27m-macosx_10_6_intel.whl file in this git repo that I've got successfully built using bazel build tool with all tests passing after pulling the latest code from syntaxnet git repository. This will setup syntaxnet 0.2 version with a simple command in barely 5 minutes as shown below:

git clone https://github.com/spoddutur/syntaxnet.git
cd <CLONED_SYNTAXNET_PROJ_DIR>
sudo pip install syntaxnet-0.2-cp27-cp27m-macosx_10_6_intel.whl
Tech Stack:

3.2 Syntaxnet Parsey McParseface wrapper for POS tagging and Dependency parsing

Here comes the most interesting (a.k.a challenging) part i.e., How to use syntaxnet in a python application. It should no more be of any trouble after this point :)

my_parser_eval.py is the file that contains the python-wrapper which I implemented to wrap SyntaxNet. The list of API's exposed in this wrapper are listed below:

1. Api to initialise parser: 
`tagger = my_parser_eval.SyntaxNetProcess("brain_tagger")`
("brain_tagger" will initialise pos tagger. change it to "brain_parser" for dependency parsing)

2. Api to input data to parser: 
`my_parser_eval._write_input("<YOUR_ENGLISH_SENTENCE_INPUT>")`

3. Api to invoke parser: 
`tagger.eval()`

3. Api to read parser's output in conll format:
`my_parser_eval._read_output()`

4. Api to pretty print parser's output as tree: 
`my_parser_eval.pretty_print()`

4. Demo

  • I wrote main.py (a sample python code) to demo this wrapper. It performs syntaxnet's dependency parsing.
  • Input to main.py: English sentence text
  • Output from main.py: Dependency graph tree

5. How to run the parser:

1. git clone https://github.com/spoddutur/syntaxnet.git
2. cd <syntaxnet-git-clone-directory>
3. python main.py 
4. That's it!!  It prints syntaxnet dependency parser output for given input english sentence

5.1 Sample output for “Bob brought the pizza to Alice” input

6. Project Structure:

  • /models: Originally cloned from syntaxnet git repository https://github.com/tensorflow/models . But this folder will additionally contain the bazel build “bazel-bin" folder with the needed runfiles.
  • custom_context.pbtxt: Custom context file used in setting context for parser.
  • my_parser_eval.py: python wrapper for “brain-tagger” POS tagger and “brain-parser” dependency parser. This file is heavily inspired from the original parser_eval.py that syntaxnet provides with quiet some modifications aand enhancements.
  • main.py: Demo sample usage
  • /data: folder where parser’s intermediate input’s and output’s are dumped.
  • .whl: osx package distribution of the final successful syntaxnet built using which you can setup syntaxnet 0.2 version in barely 5 minutes

7. References:

syntaxnet's People

Contributors

spoddutur avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

syntaxnet's Issues

Not working on Windows 10

When I first ran main.py with python 2.7, I got an error stating that asciitree was missing. I installed it with pip.

Then I got an error that tensorflow is missing. According to their website, it only works with python 3 on Windows, yet this project claim to use python 2.7?

When I tried python 3.7, I got errors for the print statements in the wrapper not using parentheses... after fixing that, some new error emerged.

needless to say, it's been far more than 5 minutes, and I have not gotten this working.

Which tensorflow/models and tensorflow/tensorflow commits did you use to create the .whl

I managed to get your wrapper to work on my macbook and it does exactly what I need, but I used the .whl file you provide to setup my syntaxnet/tensorflow installation. Now I'm trying to get this to work in a debian docker container and would like to know exactly which versions/commits you used to create .whl so that I can replicate them manually in my Dockerfile. Thanks!

print all tree not just final tree

i want to print all tree not just final tree which file i need to edit. syntaxnet create lots of tree and elemenet one by one i want all that tree

database setting

I would like to know what would be the columns name (or structure ) if set up a database against syntax parser to read and write data

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.