Giter VIP home page Giter VIP logo

jinsect's Introduction

JInsect

The JINSECT toolkit is a Java-based toolkit and library that supports and demonstrates the use of n-gram graphs within Natural Language Processing applications, ranging from summarization and summary evaluation to text classification and indexing.

Main concepts

Versions

V1.0 The first version of the JInsect library was born through a strenuous PhD effort, which means that a lot of small projects were attached to the code. Thus, the 1st version includes:

  • The n-gram graphs (NGG) representations. See my thesis, Chapter 3 for more info.
  • The NGG operators update/merge, intersect, allNotIn, etc. See my thesis, Chapter 4 for more info.
  • The AutoSummENG summary evaluation family of methods.
  • INSECTDB storage abstraction for object serialization.
  • A very rich (and useful!) utils class which one must consult before trying to work with the graphs.
  • Tools for the estimation of optimal parameters of n-gram graphs
  • Support for DOT language representation of NGGs. ...and many many side-projects that are hidden including a chunker based on something similar to a language model, a semantic index that builds upon string subsumption to determine meaning and many others. Most of these are, sadly, not documented or published.

I should stress that V1.0:

  • supports efficient multi-threaded execution
  • contains examples of application for classification
  • contains examples of application for clustering
  • contains command-line application for language-neutral summarization

TODO for V1.0:

  • Clean problematic classes that have dependencies from Web services.

V2.0 The second version of n-gram graphs is hoping to be started. The aim is to remove problematic dependencies, due to subprojects and keep the clean, core part of the project. I am also aiming to convert it into a maven project to improve integration into current solutions.

License

JInsect is under LGPL license.

jinsect's People

Contributors

ggianna avatar vharisop avatar

Stargazers

Zelone avatar Johannes Bauer avatar Panagiotis Fotopoulos avatar Andreas Grivas avatar

Watchers

 avatar  avatar Andreas Grivas avatar  avatar  avatar

Forkers

vharisop

jinsect's Issues

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.