Giter VIP home page Giter VIP logo

neural-el's Introduction

Neural Entity Linking

Code for paper "Entity Linking via Joint Encoding of Types, Descriptions, and Context", EMNLP '17

https://raw.githubusercontent.com/nitishgupta/neural-el/master/overview.png

Abstract

For accurate entity linking, we need to capture the various information aspects of an entity, such as its description in a KB, contexts in which it is mentioned, and structured knowledge. Further, a linking system should work on texts from different domains without requiring domain-specific training data or hand-engineered features. In this work we present a neural, modular entity linking system that learns a unified dense representation for each entity using multiple sources of information, such as its description, contexts around its mentions, and fine-grained types. We show that the resulting entity linking system is effective at combining these sources, and performs competitively, sometimes out-performing current state-of-art-systems across datasets, without requiring any domain-specific training data or hand-engineered features. We also show that our model can effectively "embed" entities that are new to the KB, and is able to link its mentions accurately.

Requirements

How to run inference

  1. Clone the code repository
  2. Download the resources folder.
  3. In config/config.ini set the correct path to the resources folder you just downloaded
  4. Run using:
python3 neuralel.py --config=configs/config.ini --model_path=PATH_TO_MODEL_IN_RESOURCES --mode=inference

The file sampletest.txt in the resources folder contains the text to be entity-linked. Currently we only support linking for a single document. Make sure the text in sampletest.txt is a single doc in a single line.

Installing cogcomp-nlpy

CogComp-NLPy is needed to detect named-entity mentions using NER. To install:

pip install cython
pip install ccg_nlpy

Installing Tensorflow (CPU Version)

To install tensorflow 0.12:

export TF_BINARY_URL=https://storage.googleapis.com/tensorflow/linux/cpu/tensorflow-0.12.1-cp34-cp34m-linux_x86_64.whl
(Regular) pip install --upgrade $TF_BINARY_URL
(Conda) pip install --ignore-installed --upgrade $TF_BINARY_URL

neural-el's People

Contributors

chaseduncan avatar nitishgupta avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

neural-el's Issues

Resources

The resources link is down, any way we can get it?

Is API stll working?

Hi, Thanks for your work and uploading code.

I run code, but it seems that API doesn't work now.

INFO:ccg_nlpy.pipeline_config:Using pipeline web server with API: http://austen.cs.illinois.edu:5800
INFO:ccg_nlpy.remote_pipeline:pipeline has been set up
ERROR:ccg_nlpy.remote_pipeline:Fail to connect to server.

I would appreciate it if you know how to fix it.
Thanks!

how can I get the dataset?

Download the [resources folder] 点击显示没有权限,是怎么回事呢?可以提供下数据集么,万分感谢!

Local variable 'wid_idxs' referenced before assignment

First of all, thanks a lot for making your code public!

I am trying to run inference over custom documents. Let's take as an example the following single-sentence document:

Rudolf Senti was a Liechtenstein sports shooter.

However, I run into the following error:

Traceback (most recent call last):
  File "neuralel.py", line 240, in <module>
    tf.app.run()
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/platform/app.py", line 43, in run
    sys.exit(main(sys.argv[:1] + flags_passthrough))
  File "neuralel.py", line 154, in main
    pred_TypeSetsList) = model.inference(ckptpath=FLAGS.model_path)
  File "/home/models/figer_model/el_model.py", line 467, in inference
    r = self.inference_run()
  File "/home/models/figer_model/el_model.py", line 487, in inference_run
    wid_idxs_batch, wid_cprobs_batch) = self.reader.next_test_batch()
  File "/home/readers/inference_reader.py", line 351, in next_test_batch
    return self._next_padded_batch()
  File "/home/readers/inference_reader.py", line 332, in _next_padded_batch
    wid_idxs_batch, wid_cprobs_batch) = self._next_batch()
  File "/home/readers/inference_reader.py", line 258, in _next_batch
    (wid_idxs, wid_cprobs) = self.make_candidates_cprobs(m)
  File "/home/readers/inference_reader.py", line 286, in make_candidates_cprobs
    assert len(wid_idxs) == len(wid_cprobs)
UnboundLocalError: local variable 'wid_idxs' referenced before assignment

The problem seems to be that a surface form, e.g. rudolfsenti, is not in the crosswikis dictionary.
My fix is to initialize wid_idx and wid_cprobs with

wid_idxs = [0]
wid_cprobs = [0.0]

Is this a proper way of dealing with the error? If so, maybe you can include it in your code, I imagine this must be a rather common scenario.

Concerning the entity representation

Hi, thank you for sharing the code.
I am actually interested in the vector representations of entities and I wonder how can I generate the entity embedding described in the paper?
Many thanks,
Weixin.

Train a new model

How would it be possible to train a new model instead of using your pretrained models?

How can I get the dataset?

Download the [resources folder] ,I click that but show no permission, what's going on? Can you provide the data set, thank you very much!

AttributeError: 'NoneType' object has no attribute 'get_tokens'

WARNING:ccg_nlpy.pipeline_config:Models not found. To use pipeline locally, please refer the documentation for downloading models.
INFO:ccg_nlpy.pipeline_config:Using pipeline web server with API: http://austen.cs.illinois.edu:5800
INFO:ccg_nlpy.remote_pipeline:pipeline has been set up
Loading Glove Word Vocabulary
Loading Type Label Vocabulary
Loading Known Entity Vocabulary ...
Loading wid2Wikititle
Loading Coherence Strings Dicts ...
Loading Crosswikis dict. (takes ~2 mins to load)
Crosswikis loaded. Size: 37650380
Loading Glove Word Vectors
[#] Glove Vectors loaded!
[#] Test Mentions File : neural-el_resources/sampletest.txt
[#] Loading test file and preprocessing ...
WARNING:ccg_nlpy.remote_pipeline:Unexpected status code 503, please open an issue on GitHub for further investigation.
Traceback (most recent call last):
File "neuralel.py", line 240, in
tf.app.run()
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/platform/app.py", line 126, in run
_sys.exit(main(argv))
File "neuralel.py", line 89, in main
coherence=FLAGS.coherence)
File "/home/mldl/ub16_prj/neural-el/readers/inference_reader.py", line 65, in init
self.processTestDoc(test_mens_file)
File "/home/mldl/ub16_prj/neural-el/readers/inference_reader.py", line 102, in processTestDoc
self.doc_tokens = self.ccgdoc.get_tokens
AttributeError: 'NoneType' object has no attribute 'get_tokens'

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.