Giter VIP home page Giter VIP logo

docl-ner's Introduction

Introduction

Codes for our paper Uncertainty-Aware Sequence Labeling and Leveraging Document-Level Label Consistency for Named Entity Recognition.

Requirement

Python: 3.6 or higher.
PyTorch 1.0 or higher.

Setup

Download Glove embedding from here.

Input format:

We use standard CoNLL format with each character and its label split by a whitespace in a line. The "BMES" tag scheme is prefered. Make sure to use -DOCSTART- to indicate the begining of a document.

A example from CoNLL2003 (additional pos/chunk features are not used in our experiments):

-DOCSTART- -X- -X- O

EU NNP B-NP S-ORG
rejects VBZ B-VP O
German JJ B-NP S-MISC
call NN I-NP O
to TO B-VP O
boycott VB I-VP O
British JJ B-NP S-MISC
lamb NN I-NP O
. . O O

Peter NNP B-NP B-PER
Blackburn NNP I-NP E-PER

Usage

training

Run without document-level memory:

CUDA_VISIBLE_DEVICES=0 python main.py --train_dir 'data/conll2003/train.txt' --dev_dir 'data/conll2003/dev.txt' --test_dir 'data/conll2003/test.txt'  --model_dir 'outs' --word_emb_dir 'data/glove.6B.100d.txt --use_memory False'

Run with document-level memory:

CUDA_VISIBLE_DEVICES=0 python main.py --train_dir 'data/conll2003/train.txt' --dev_dir 'data/conll2003/dev.txt' --test_dir 'data/conll2003/test.txt'  --model_dir 'outs' --word_emb_dir 'data/glove.6B.100d.txt'

decoding

Run without document-level memory:

CUDA_VISIBLE_DEVICES=0 python main.py --status decode --model_dir <model dir> --raw_dir <file to be predicted> --use_memory False

Run with document-level memory:

CUDA_VISIBLE_DEVICES=0 python main.py --status decode --model_dir <model dir> --raw_dir <file to be predicted>

models

We upload a model trained on CoNLL2003 dataset here.

Cite

If you use our code, please cite our paper as follows:

@inproceedings{gui2020leveraging,
author = {Gui, Tao and Ye, Jiacheng and Zhang, Qi and Zhou, Yaqian and Gong, Yeyun and Huang, Xuanjing},
title = {{Leveraging Document-Level Label Consistency for Named Entity Recognition}},
publisher = {International Joint Conferences on Artificial Intelligence Organization},
year = {2020}
}
@article{gui2021uncertainty,
  title={Uncertainty-Aware Sequence Labeling},
  author={Gui, Tao and Ye, Jiacheng and Zhou, Xiang and Zheng, Xiaoqing and Zhang, Qi},
  journal={IEEE/ACM Transactions on Audio, Speech, and Language Processing},
  year={2021},
  publisher={IEEE}
}

Reference

docl-ner's People

Contributors

jiacheng-ye avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

docl-ner's Issues

I found a small problem

You say "We use standard CoNLL format with each character and its label split by a whitespace in a line. The "BMES" tag scheme is prefered. Make sure to use -DOCSTART- to indicate the begining of a document." But I found CoNLL tag scheme is "BIES"

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.