Giter VIP home page Giter VIP logo

ner_incomplete_annotation's Introduction

Better Modeling of Incomplete Annotation for Named Entity Recognition

This repository implements an LSTM-CRF model for named entity recognition. The model is same as the one by Lample et al., (2016) except we do not have the last tanh layer after the BiLSTM. The code provided is used for the paper "Better Modeling of Incomplete Annotation for Named Entity Recognition" published in 2019 Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL).

NOTE: To extend a more general use case, a PyTorch version is implemented in this repo. The previous implementation using DyNet can be found in first release here. Right now, I have implemented a the "hard" approach as in the paper. "Soft" approach would be coming soon.

Our codebase is built based on the pytorch LSTM-CRF repo.

Requirements

  • PyTorch >= 1.1
  • Python 3

Put your dataset under the data folder. You can obtain the conll2003 and conll2002 datasets from other sources. We have put our collected industry datasets ecommerce and youku under the data directory.

Also, put your embedding file under the data directory to run. You need to specify the path for the embedding file.

Running our approaches

python3 main.py --embedding_file ${PATH_TO_EMBEDDING} --dataset conll2003 --variant hard

Change hard to soft for our soft variant. (This version actually also supports using contextual representation. But I'm still testing during this weekend.)

Future Work

  • add soft approach
  • add other baselines.

Citation

If you use this software for research, please cite our paper as follows:

The implementation in our paper is implemented with DyNet. Check out our previous release.

@inproceedings{jie2019better,
  title={Better Modeling of Incomplete Annotations for Named Entity Recognition},
  author={Jie, Zhanming and Xie, Pengjun and Lu, Wei and Ding, Ruixue and Li, Linlin},
  booktitle={Proceedings of NAACL},
  year={2019}
}

ner_incomplete_annotation's People

Contributors

allanj avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.