Giter VIP home page Giter VIP logo

nlp-ner's Introduction

NLP-NER

Analyze the organization name recognition system and dataset annotations.

Tools used: Python 3.7.3 = used a programing language spaCy 2..2.4 = used to evaluate the NER models en_core_web_sm model to rn analysis on given datasets Jupyterlab 2.1.2 = used to test and run scripts in Notebooks

Results: I was able to read and run analysis on the Ontonotes.json dataset, using spaCy’s en_core_web_sm model I was not able to evaluate the Enron sentences,json dataset. I was able to read and print the file I have tried all spaCy’s models I could find. The script was breaking when trying to iterate through the Json objects using : f = open ('/Users/l_parau/Enronsentences.json', "r") TEST_DATA = json.loads(f.read()) test_sentences = [x[0] for x in TEST_DATA[0:5]] I was getting this error: ----> 9 test_sentences = [x[0] for x in TEST_DATA[0:5]]

One thing I noticed is that the Enronsentences.json objects are formatted differently, having an extra { “text”: in the beginning. After some research I found out that it is consistent with the spaCy’s other annotation tool, Prodigy, which does not have a free version. I have tried finding ways to either read the json differently or top converted into spaCy’s format, butI did not succeed. I was unable to analyse more datasets from either of the two generic target domains, OntoNotes 5.0 dataset or https://www.cs.cmu.edu/~enron. The OntoNotes5.0.zip was not extracting (I don’t know if it was the size of the zip, ~1.5gb) whie for https://www.cs.cmu.edu/~enron, my account either did not have sufficient credentials or I was doing something wrong, because I was not able to download additional datasets. For the NER annotation analysis on the Ontonotes.json dataset: OntoNotesMetrics.ypinb runs a script to calculate the Precision, Recall and F1-score for the model OntoNotesScorer.ypinb uses spaCy’s scorer method to display the scores for entities in Ontonotes.json
For the ORG entity analysis I was unsuccessful in getting conclusive results, even after trying to adapt scripts found on various platforms, like Git repositories, Stackoverflow, etc.

nlp-ner's People

Contributors

paraulaurean avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.