Giter VIP home page Giter VIP logo

afet's Introduction

AFET: Automatic Fine-Grained Entity Typing by Hierarchical Partial-Label Embedding

Source code and data for EMNLP'16 paper AFET: Automatic Fine-Grained Entity Typing by Hierarchical Partial-Label Embedding.

Given a text corpus with entity mentions detected and heuristically labeled by distant supervision, this code performs training of a rank-based loss over distant supervision and predict the fine-grained entity types for each test entity mention. For example, check out AFET's output on WSJ news articles.

An end-to-end tool (corpus to typed entities) is under development. Please keep track of our updates.

Performance

Performance of fine-grained entity type classification over Wiki (Ling & Weld, 2012) dataset.

Method Accuray Macro-F1 Micro-F1
HYENA (Yosef et al., 2012) 0.288 0.528 0.506
FIGER (Ling & Weld, 2012) 0.474 0.692 0.655
FIGER + All Filter (Gillick et al., 2014) 0.453 0.648 0.582
HNM (Dong et al., 2015) 0.237 0.409 0.417
WSABIE (Yogatama et al,., 2015) 0.480 0.679 0.657
AFET (Ren et al., 2016) 0.533 0.693 0.664

System Output

The output on BBN dataset can be found here. Each line is a sentence in the test data of BBN, with entity mentions and their fine-grained entity typed identified.

Dependency

  • python 2.7, g++
  • Python library dependencies
$ pip install pexpect unidecode six requests protobuf
$ cd DataProcessor/
$ git clone [email protected]:stanfordnlp/stanza.git
$ cd stanza
$ pip install -e .
$ wget http://nlp.stanford.edu/software/stanford-corenlp-full-2016-10-31.zip
$ unzip stanford-corenlp-full-2016-10-31.zip
$ rm stanford-corenlp-full-2016-10-31.zip

Data

We pre-processed three public datasets (train/test sets) to our JSON format. We ran Stanford NER on training set to detect entity mentions, and performed distant supervision using DBpediaSpotlight to assign type labels:

  • Wiki (Ling & Weld, 2012): 1.5M sentences sampled from 780k Wikipedia articles. 434 news sentences are manually annotated for evaluation. 113 entity types are organized into a 2-level hierarchy (download JSON)
  • OntoNotes (Weischedel et al., 2011): 13k news articles with 77 of them are manually labeled for evaluation. 89 entity types are organized into a 3-level hierarchy. (download JSON)
  • BBN (Weischedel et al., 2005): 2,311 WSJ articles that are manually annotated using 93 types in a 2-level hierarchy. (download JSON)
  • Type hierarches for each dataset are included.
  • Please put the data files in the corresponding subdirectories under AFET/Data/.

Makefile

$ cd AFET/Model; make

Default Run

Run AFET for fine-grained entity typing on BBN dataset

$ java -mx4g -cp "DataProcessor/stanford-corenlp-full-2016-10-31/*" edu.stanford.nlp.pipeline.StanfordCoreNLPServer
$ ./run.sh  

Parameters - run.sh

Dataset to run on.

Data="BBN"
  • concrete parameters for running each dataset can be found in the README in corresponding data folder under AFET/Data/

Evaluation

Evaluate prediction results (by classifier trained on de-noised data) over test data

python Evaluation/emb_prediction.py $Data pl_warp bipartite maximum cosine 0.25
python Evaluation/evaluation.py $Data pl_warp bipartite
  • python Evaluation/evaluation.py -DATA(BBN/ontonotes/FIGER) -METHOD(hple/...) -EMB_MODE(hete_feature)

Publication

Please cite the following paper if you find the codes and datasets are helpful:

@inproceedings{Ren2016AFETAF,
  title={AFET: Automatic Fine-Grained Entity Typing by Hierarchical Partial-Label Embedding},
  author={Xiang Ren and Wenqi He and Meng Qu and Lifu Huang and Heng Ji and Jiawei Han},
  booktitle={EMNLP},
  year={2016}
}

afet's People

Contributors

little8hwq avatar shanzhenren avatar ellenmellon avatar

Watchers

James Cloos avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.