Giter VIP home page Giter VIP logo

donaurelio / text-analyzer Goto Github PK

View Code? Open in Web Editor NEW
2.0 3.0 0.0 21.99 MB

This projects is dedicated to an University Assignment about Natural Language Processing With Freeling and Python

License: MIT License

Python 2.29% HTML 47.17% Ruby 0.11% JavaScript 21.61% CSS 28.79% PowerShell 0.02%
natural-language-processing tokenization morphological-analysis stanford-pos-tagger stanford-parser text-parser text-analyzer docker-container nltk python

text-analyzer's Introduction

TextAnalyzer

This projects is dedicated to an University Assignment related with Natural Language Processing. The application was designed in python 2.7 with Django 1.9 and is composed by:

  • Tokenization and Morfological Analisys module (called morfo) using freeling and Python 2.7. This app takes a raw text and performs the corresponding Morfoligical Analysis.
  • The second module (textparser) covers Syntactic Analisys. It deals with the generation of syntactic trees using probabilistic models (Stanford and Bikel) given a raw text.

Running this project

To getting this projecto working we need to setting up the morfo and textparser modules. The configuration

TextAnalyser
│   README.md
│   requirements.txt    
│
└───tkmorfo
        applications
        |
        └───morfo
        |           
        └───textparser
                tools
                |
                └───helpers
                        00-raw
                        00
                        dbparser
                        parseval
                        stanford-parserfull-2015-12-09
                        stanford-postagger-2015-12-09
                        utils.py

Setting The Docker Container

This projects was designed into a container, The first module Tokenization and Morfological Analisys depends on freeling and python 2.7. You can find those package installed on this docker image.

The second module Syntatic Analisys depends of the following libraries

  • Dan Bikel’s Parsing Engine: dbparser.tar.gz

  • Penn Treebank based Trainning set: wsj-02-21.mrg.tar.gz

  • Evaluate the accurancy of the model: parseval.tar.gz

  • Test set: 00-raw.tar.gz

Those files can be found this. Other needed files are:

Runnig Graphical Applications Into a Contaner

To run the Syntactic Analisys module the container needs to be able to "show" or "create" grafical UIS. This allow the app to create the parse tree images generated with nltk.

apt-get install python-tk
apt-get update
apt-get install xvfb
apt-get install imagemagick

Then you need to run the following command every time that the container starts.

Xvfb :1 -screen 0 1024x768x16 &> xvfb.log  &
DISPLAY=:1.0
export DISPLAY

Installing Java for nltk Stanford Pos tagger and parser in the Container

echo deb http://http.debian.net/debian jessie-backports main >> /etc/apt/sources.list
apt-get update && apt-get install openjdk-8-jdk
update-alternatives --config java

References

[1] Image Viwer HTML Module

[2] Running a GUI Application in a Docker Container

[3] Draw Parse Trees with NLTK

[4] Installing Java 8

[5] ImagViwer

text-analyzer's People

Contributors

bryantabarez avatar donaurelio avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.