Giter VIP home page Giter VIP logo

jape's Introduction

JAPE

Source code and datasets for ISWC2017 research paper "Cross-lingual entity alignment via joint attribute-preserving embedding", a.k.a., JAPE.

Code

The correspondence between python files and our JAPE variants is as follows:

  • se_pos.py == SE w/o neg
  • se_pos_neg.py == SE
  • cse_pos_neg.py == SE + AE

To run SE, please use:
python3 se_pos.py ../data/dbp15k/zh_en/ 0.3

To learn attribute embeddings, please use:
python3 attr2vec.py ../data/dbp15k/zh_en/ ../data/dbp15k/zh_en/0_3/ ../data/dbp15k/zh_en/all_attrs_range ../data/dbp15k/en_all_attrs_range

To calculate entity similarities, please use:
python3 ent2vec_sparse.py ../data/dbp15k/zh_en/ 0.3 0.95 0.95 0.9

Dependencies

  • Python 3
  • Tensorflow 1.2
  • Scipy
  • Numpy

Datasets

In our experiment, we do not use all the triples in datasets. For relationship triples, we select a portion whose head and tail entities are popular. For attribute triples, we discard their values due to diversity and cross-linguality.

The whole datasets can be found in the website or Google Drive.

Directory structure

Take DBP15K (ZH-EN) as an example, the folder "zh_en" contains:

  • all_attrs_range: the range code of attributes in source KG (ZH);
  • ent_ILLs: all entity links (15K);
  • rel_ILLs: all relationship links (with the same URI or localname);
  • s_labels: cross-lingual entity labels of source KG (ZH);
  • s_triples: relationship triples of source KG (ZH);
  • sup_attr_pairs: all attribute links (with the same URI or localname);
  • t_labels: cross-lingual entity labels of target KG (EN);
  • t_triples: relationship triples of target KG (EN);
  • training_attrs_1: entity attributes in source KG (ZH);
  • training_attrs_2: entity attributes in target KG (EN);

On top of this, we built 5 datasets (0_1, 0_2, 0_3, 0_4, 0_5) for embedding-based entity alignment models. "0_x" means that this dataset uses "x0%" entity links as training data and uses the rest for testing. The two entities of each entity link in training data have the same id. In our main experiments, we used the dataset in "0_3" which has 30% entity links as training data.

The folder "mtranse" contains the corresponding 5 datasets for MTransE. The difference lies in that the two entities of each entity link in training data have different ids.

Dataset files

Take the dataset "0_3" of DBP15K (ZH-EN) as an example, the folder "0_3" contains:

  • ent_ids_1: ids for entities in source KG (ZH);
  • ent_ids_2: ids for entities in target KG (EN);
  • ref_ent_ids: entity links encoded by ids for testing;
  • ref_ents: URIs of entity links for testing;
  • rel_ids_1: ids for relationships in source KG (ZH);
  • rel_ids_2: ids for relationships in target KG (EN);
  • sup_ent_ids: entity links encoded by ids for training;
  • sup_rel_ids: relationship links encoded by ids for training;
  • triples_1: relationship triples encoded by ids in source KG (ZH);
  • triples_2: relationship triples encoded by ids in target KG (EN);

Running and parameters

Due to the instability of embedding-based methods, it is acceptable that the results fluctuate a little bit (±1%) when running code repeatedly.

If you have any difficulty or question in running code and reproducing expriment results, please email to [email protected] and [email protected].

Citation

If you use this model or code, please cite it as follows:

@inproceedings{JAPE,
  author    = {Zequn Sun and Wei Hu and Chengkai Li},
  title     = {Cross-Lingual Entity Alignment via Joint Attribute-Preserving Embedding},
  booktitle = {ISWC},
  pages     = {628--644},
  year      = {2017}
}

Links

The following links point to some recent work that uses our datasets:

jape's People

Contributors

sunzequn avatar whu2015 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

jape's Issues

Interpretation of output

Hi!
I tried to use this model and run se_pos.py, attr2vec.py and ent2vec_sparce.py. So, I have output files attr_embeddings, attrs_meta, attrs_vec.npy, ents_embeddings_1, ents_embeddings_2, ents_sim.mtx, ent_vec_1.npy, ent_vec_2.npy, kb1_ents_sim.mtx, kb2_ents_sim.mtx.
Can you help me to understand their meaning and explain how do you analyze results?

Best wishes, Vika

请教关于数据集问题

您好,在运行python3 se_pos.py ../data/dbp15k/zh_en/ 0.3的时候就会提示No such file or directory: '../data/dbp15k/zh_en/0_3/triples_1'。
请问您的0_x数据集是在哪里呢?或者是需要先执行哪个代码才会构建0_x数据集呢?

attrs_meta files location

In order to learn more on entity similarities i wish to execute ent2vec_sparse.py file but i am not able to trace attrs_meta file within dataset. Please suggest on this

请教问题

你好,我想问一下怎样处理完整的dbp15k数据集到jape模型所需要的输入,这一块有代码吗?
非常感谢!!!

The ensemble method of translation model and JAPE

Your JAPE paper has conducted ensemble experiment on google translation and JAPE models as said in the blue highlighted text below:
Screenshot from 2019-08-27 21-46-51
But I can not quite understand your ensemble method. What does it mean by

for each latent aligned entities, we considered the lower rank of the two results as the combined rank

Besides, I can not find the corresponding code in this repository. Could you please help me out here?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.