Giter VIP home page Giter VIP logo

Sherlok

Distributed restful text mining.

Join the chat at https://gitter.im/sherlok/sherlok Build Status

Sherlok is a flexible and powerful open source, distributed, real-time text-mining engine. Sherlok works as a RESTful annotation server based on Apache UIMA. For example, Sherlok can:

  • highlight persons and locations in text (using DKPro OpenNLP),
  • identify proteins and brain regions in biomedical texts (using Bluima),
  • perform sentiment analysis using deep learning (using Stanford Sentiment),
  • analyse the syntax of tweets (using TweetNLP),
  • analyze clinical text and perform knowledge extraction (using Apache cTAKES)

Getting Started

  • Download and unzip the latest Sherlok release
  • Install a Java runtime
  • Run bin/sherlok (Unix), or bin/sherlok.bat (Windows)

Annotate neuron mentions from Python:

pip install --upgrade sherlok

>>> from sherlok import Sherlok
>>> print list(Sherlok().annotate('neuroner', 'layer 4 neuron'))

[(0, 14, 'layer 4 neuron', u'Neuron', {}),
 (8, 14, 'neuron',  u'Neuron', {}),
 (8, 14, 'neuron',  u'NeuronTrigger', {}),
 (0, 7,  'layer 4', u'Layer', {u'ontologyId': u'HBP_LAYER:0000004'})]

Tag persons and locations with Javascript:

require('sherlok');
var text = 'Jack Burton (born April 29, 1954 in El Paso), also known as Jake Burton, is an American snowboarder and founder of Burton Snowboards.';
sherlok.annotate('opennlp.ners.en', text, function(annotation){
      console$(annotation);
});
{ begin=0, end=11,  value="person"}
{ begin=36, end=43, value="location"}
{ begin=60, end=71, value="person"}

More Built-in Text mining pipelines

Further Documentation

sherlok's Projects

sherlastic icon sherlastic

Semantic enrichment for Elasticsearch using Sherlok

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.