Giter VIP home page Giter VIP logo

ner's Introduction

NER

Build

cd src
make

Run

./preprocess pages-articles.xml names links # output to names and links
./relatedness links rel # output to rel
./ner names links rel 4242 # listen on port 4242 after printing "ready"
curl localhost:4242 -d ""

How it works

preprocess parses pages-articles.xml to extract links between pages and a mapping from text to entities. The confidence in the mapping $a\mapsto b$ is $$conf(a\mapsto b)=\frac{count(a\mapsto b) - 1}{\sum_{c\in Pages}{count(a\mapsto c)}}$$

relatedness computes for each page $p$ the number $paths2(p, p)$ of paths of lenght 2 from $p$ to $p$ and $\sum_{q\in Pages}{rel(p,q)}$ with $$rel(p,q)=\frac{paths2(p, q) + paths2(q, p)}{paths2(p, p) + paths2(q, q)}$$

ner extract entities from text. All candidate entities are extracted (longest match of text) and sorted by confidence. Entities are selected greedily starting with the worst candidate. If the worst candidate is the only remaining candidate for a piece of text, then it is selected, otherwise it is pruned and confidence of all other candidates is updated.

Confidence that text $t_i$ refers to candidate $c_i$ is $$conf(c_i, t_i)=conf(t_i\mapsto c_i)\sum_{t_j\cap t_i = \emptyset}{rel(c_i,c_j)}$$

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.