Giter VIP home page Giter VIP logo

silentflame / named-entity-recognition Goto Github PK

View Code? Open in Web Editor NEW
44.0 5.0 16.0 29.89 MB

Corpus and a baseline neural network system for Named Entity Recognition in Hindi-English Code-Mixed social media text.

License: GNU General Public License v3.0

Python 100.00%
nlp-machine-learning research-paper ner social-media python acl-news2018 preprocessing neural-network decision-trees crfsuite

named-entity-recognition's Introduction

Named-Entity-Recognition

We have created a dataset of Hindi-English Code-Mixed Social Media Text (tweets) for the task of Named Entity Recognition. Tweets are pre-processed and annotated as per the 6 NER tags and a 7th Other tag.

NER-Tags

  • B-Per Indicates the Begening of a Person's name.
  • I-Per Indicates the intermediate of a Person's name.
  • B-Org Indicates the Begening of a Organizations's name.
  • I-Org Indicates the intermediate of a Organizations's name.
  • B-Loc Indicates the Begening of a Locations's name.
  • I-Loc Indicates the intermediate of a Locations's name.
  • Other Indicates all the word not falling in any of the above 6.

eg:

#Word #Tag
Bharat B-Loc
ke Other
2016 Other
ke Other
Demonetization Other
mein Other
kitna Other
kala Other
dhan Other
real Other
mein Other
aaya Other
??? Other
Accha Other
hua Other
ye Other
prashna Other
Miss B-Per
Word I-Per
Chillar I-Per
ko Other
nahi Other
puccha Other
gaya Other
0 Other
#misschillar B-Per
#missworld Other
#Demonetisation Other
#notebandi Other
#modi B-Per
#bjp B-Org
#gujrat B-Loc

Contents

  • TwitterData folder contains Id's of the scrapped tweets inside Scrapped folder, and processed and annotated data as named inside this.
  • All the three Models.py are the files for the three ML classification models we used for our reserach paper.
  • preprocessing and vector creation scripts are added with names indicating that.
  • This dataset is in development and in future we will extend this to more number of tweets so as to make it a more reliable dataset for this taska and others.

Outputs

  • DecisionTree and CRF models have direct score calls that gives all the required stats.
  • Keras does not provide the same for displaying score stats for LSTM model, so we build a coustom call of all the measure values and took average over all the iterations (here 5).
  • All the models performed well on the given data.
  • Decision Tree model with a f1-score of 0.94.
  • Conditional Random Field (CRF) model with a f1-score of 0.95.
  • LSTM model with a f1-score of 0.95.

Authors
  • Vinay Singh
  • Deepanshu Vijay
  • Syed A. Sarfaraz
  • Manish Srivastava

LTRC IIIT-Hyderabad


Citation

Named Entity Recognition for Hindi-English Code-Mixed Social Media Text

2018, 27-35, Proceedings of the Seventh Named Entities Workshop here

named-entity-recognition's People

Contributors

silentflame avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

named-entity-recognition's Issues

data preparation

please help me to know how data labelling has been done.
I need some help. I have large documents and to do it it will take lot of time.
Please guide me how you have done labelling?

Tagged data not created

sir getting error in feature tag.py
tagged data not created
could you please share this one Tweet_token_tags - Sheet1.c
tagged data not created
sv

Regarding data preprocesing for hinglish

Please can you tell me how you extarcting hinglish sentences from the tweets .i have seprated all the urls numbers and other languages except these english language.
so my humble request to you please reply .

thank you

Error in decisionTreeModel.py

The DecisionTreeClassifier does not support a list of dictionaries as a classs_weight. How did the code run then?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.