Giter VIP home page Giter VIP logo

night_ridir's Introduction

night_ridir

Containerized tools for event data dictionary development, including:

  • noun and verb extraction, using the /get_phrases endpoint
  • automated word2vec "synonym" discovery with the /get_synonyms endpoint
  • checking OEDA actor dictionaries for near-matches

Docker Setup

From inside this directory, you can build the image by running

sudo docker build -t nr_tools .

(Note the trailing period).

Then run it on port 5000 with

docker run -d -p 5000:5000 nr_tools

Endpoints

A GET request to each synonym returns usage information.

  • /get_phrases : Given text and CoreNLP parses, return noun and verb phrases that are candidate dictionary entries
  • /get_synonyms : Given an English word or words, find possible synonyms using word2vec
  • /get_synonyms_ar : Same as above, but using an Arabic word2vec model trained on Wikipedia
  • /lookup_dictionary : Given a list of synonyms, check if any match existing entries in the actor dictionaries.

Examples

Phrase extraction

Pass a sentence and its CoreNLP parse to get back phrases and codings (if applicable):

curl -XPOST -H "Content-Type: application/json" --data '{"text" : "Hundreds of Egyptian journalists rallied in Cairo on Wednesday in an escalating standoff with police, threatening a possible strike by media workers if demands including the dismissal of the interior minister are not met.", "parse" : "(ROOT (S (NP (NP (NNS Hundreds)) (PP (IN of) (NP (JJ Egyptian) (NNS journalists)))) (VP (VBD rallied) (PP (IN in) (NP (NNP Cairo))) (PP (IN on) (NP (NNP Wednesday))) (PP (IN in) (NP (NP (DT an) (VBG escalating) (NN standoff)) (PP (IN with) (NP (NN police))))) (, ,) (S (VP (VBG threatening) (NP (DT a) (JJ possible) (NN strike)) (PP (IN by) (NP (NNS media) (NNS workers))) (SBAR (IN if) (S (NP (NP (NNS demands)) (PP (VBG including) (NP (NP (DT the) (NN dismissal)) (PP (IN of) (NP (DT the) (JJ interior) (NN minister)))))) (VP (VBP are) (RB not) (VP (VBN met)))))))) (. .)))"}' 'http://localhost:5000/get_phrases'

should return

{"verbs": "rallied", 
"noun_coding": [["EGYMED"], ["EGY"], ["~COP"], ["~COP"], ["~MEDLAB"], ["~GOV"], ["~GOV"], ["~GOV"]], 
"nouns": [[" EGYPTIAN", " JOURNALISTS"], [" CAIRO"], [" POLICE"], [" POLICE"], [" MEDIA", " WORKERS"], [" INTERIOR MINISTER"], [" INTERIOR MINISTER"], [" INTERIOR MINISTER"]], 
"verb_coding": "150"}

Synonyms

Pass a list of word(s) to get their word2vec synonyms:

curl -XPOST -H "Content-Type: application/json" --data '{"text" : ["artillery"]}' 'http://localhost:5000/get_synonyms'

Expect a list of "synonyms" back:

["BRITISH_TROOPS", "FORCES_IN", "WAR_WITH", "ANTIAIRCRAFT",
"ARTILLERY_BOMBARDMENT", "MACHINEGUN_FIRE", "MORTAR_PLATOON",
"CARPATHIAN_FOREST", "TARGETED_BY", "AIR_STRIKES", "WARPLANES", "ANTITANK",
"DRAGOON_GUARDS", "ARTILLERY_BARRAGES", "LEGION_OF", "ORDNANCE", "IN_KASHMIR",
"SPECIAL_OPERATIONS", "MM_HOWITZER", "SHELLFIRE", "ARTILLERY",
"ARTILLERY_SHELLS", "CANNONEERS", "BLAST_KILLS", "ARMY_AND", "HOWITZERS",
"##MM_CANNON", "ARTILLERY_REGIMENT", "FIELD_ARTILLERY", "CAVALRY",
"SWEDISH_METALLERS", "INFANTRY", "ARTILLERY_MORTARS", "ARTILLERIES",
"FRONTMAN", "REGIMENT", "ARTILLERY_BARRAGE", "PARACHUTE_REGIMENT",
"ARMOURED_REGIMENT", "RIFLES", "MORTARS", "ARMY_ORDNANCE",
"ARTILLERY_SHELLING", "ARTILLERY_BRIGADE", "MISSILES", "###MM_HOWITZERS",
"BC_AS_GEN", "MORTAR", "TANKS_ARTILLERY", "ARTILLERY_BATTALION", "KHYBER",
"CAR_BOMB", "DRONE", "BOMBARDMENT"]

Dictionary duplicate finder

When updating dictionaries, coders will often come across an alternative version of an actor that's already in the dictionaries (e.g., "Obama" instead of "Barack Obama"). night_rider can help match the new form of a name to an existing dictionary entry. First, get possible synonyms of the new term using /get_synonyms. Then, pass that list of synonyms to /lookup_dictionary to see if any of them have matches in the existing dictionaries and return the corresponding entries. Example usage:

curl -XPOST -H "Content-Type: application/json" --data '{"syns": ["BARACK_OBAMA", "OBAMA"]}' 'http://localhost:5000/lookup_dictionary'

returns

[[["USAELI", ["780101", "000101"]], ["USAGOV", [">090120"]], ["BARACK", "OBAMA"]], [["USAELI", ["780101", "000101"]], ["USAGOV", [">090120"]], ["BARACK", "OBAMA"]]]

The response is a list of lists, where each sub-list contains one role entries for the matched actor, with the canonical name form being the final element.

night_ridir's People

Contributors

ahalterman avatar yanliang1102 avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.