Giter VIP home page Giter VIP logo

ukge's Introduction

Embedding Uncertain Knowledge Graphs

This repository includes the code of UKGE and data used in the experiments.

Install

Make sure your local environment has the following installed:

Python3
tensorflow >= 1.5.0
scikit-learn

Install the dependents using:

pip install -r requirements.txt

Run the experiments

To run the experiments, use:

python ./run/run.py

or

python ./run/run.py --data ppi5k --model rect --batch_size 1024 --dim 128 --epoch 100 --reg_scale 5e-4

You can use --model logi to switch to the UKGE(logi) model.

Data is available at: https://drive.google.com/file/d/1UJQ8hnqPGv1O9pYglfNF5lY_sgDQkleS/view?usp=sharing

Reference

Please refer to our paper. Xuelu Chen, Muhao Chen, Weijia Shi, Yizhou Sun, Carlo Zaniolo. Embedding Uncertain Knowledge Graphs. In Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence (AAAI), 2019

@inproceedings{chen2019ucgraph,
    title={Embedding Uncertain Knowledge Graphs},
    author={Chen, Xuelu and Chen, Muhao and Shi, Weijia and Sun, Yizhou and Zaniolo, Carlo},
    booktitle={Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence (AAAI)},
    year={2019}
}

ukge's People

Contributors

stasl0217 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

ukge's Issues

python run.py error

Thank you for sharing your code and data. When I run 'python run.py', it gives me 'No train.csv' error. Google drive link that provided in Readme does not have those data. Where can i get those data? Thank you

Hyperparams for Table 4

Hi.. Can you please share the hyperparams that were used to get the results of UKGE logi and rect models in Table 4 of the paper?

Some problems in data

2000+ triples appear in both train.tsv and test.tsv with different confidence.

e.g.
image

psl batch size

Hi!

soft_h_index, soft_r_index, soft_t_index, soft_w_index = self.batchloader.gen_psl_samples() # length: param.n_psl

in every epoch of training, for each batch, a psl batch with random triple(s) is generated and its default size is 1. What should be the exact psl batch size for the experiments in the paper? And why is it randomly generated?

Negative sampling size?

What are the negative sampling sizes for the best hyper-parameter combinations given in the paper?

Also parameter n_neg is explained as "Number of negative samples per (h,r,t)" in the code, in run.py.
However, in the implementation both head and tail are corrupted separately for each triple which results in double size of n_neg negative samples per (h,r,t).

A question about softlogic.tsv

Hi, thank you for sharing your code and data!
I would like to know how you obtained the data in softlogic.tsv. Did you use the PSL program implemented by Linqs to calculate the values in softlogic.tsv, or did you write your own logic to calculate these values? Because the implementation of PSL in Linqs is relatively complex, and the method in your paper does not involve many other functions in PSL, I would like to know if you have a simpler way to implement this simpler PSL.
Looking forward to your reply!

Confusion about the computation of nDCG

def ndcg(self, h, r, tw_truth):

In the calculation of iDCG, we assume that the optimal ranking is 1, 2, 3, 4, ...
When this code is trying to calculate the real ranking of the tail entity, it is to calculate how many entities among all entities have a larger score than the target entity. However, if the tail entity list of a query (head entity plus relationship) has some tail entities with the same score, then there will be a tie in the ranking, and the optimal ranking is not a natural number sequence (1, 2, 3, 4 ... ), leading to the nDCG may be greater than 1.

I'm not sure if this problem exists in the code, thanks.

About the importance of PSL

Hi, thank you for sharing your code and data.

I am curious on the importance of PSL and I conducted serval experiments on data PPi5k provided here. I find that the lower 'self._p_psl' in models.py we chose, and we get the higher nGCG of both linear and exp version of UKGE_rect.

For example, train with early stop with batch_size=1024, embedding_dim=128, _p_psl=0.2, i get nDCG_linear = 0.951133, nDCG_exp=0.950328, if _p_psl=0, nDCG_linear = 0.960531, nDCG_exp=0.50403

And i think this is on the contrary of the result in the paper
http://web.cs.ucla.edu/~yzsun/papers/2019_AAAI_UKG.pdf.

In fact, I think psl should be of some benifits as it describes the uncertainty by rules. But the results i found above make me puzzled. Can you share some light?

About negative sampling in training process

In the following function, UKGE has an implementation about corrupt a batch for training:

def corrupt_batch(self, h_batch, r_batch, t_batch):

    def corrupt_batch(self, h_batch, r_batch, t_batch):
        N = self.this_data.num_cons()  # number of entities

        neg_hn_batch = np.random.randint(0, N, size=(
        self.batch_size, self.neg_per_positive))  # random index without filtering
        neg_rel_hn_batch = np.tile(r_batch, (self.neg_per_positive, 1)).transpose()  # copy
        neg_t_batch = np.tile(t_batch, (self.neg_per_positive, 1)).transpose()

        neg_h_batch = np.tile(h_batch, (self.neg_per_positive, 1)).transpose()
        neg_rel_tn_batch = neg_rel_hn_batch
        neg_tn_batch = np.random.randint(0, N, size=(self.batch_size, self.neg_per_positive))

        return neg_hn_batch, neg_rel_hn_batch, neg_t_batch, neg_h_batch, neg_rel_tn_batch, neg_tn_batch

However, is it possible to sample positive samples in such a random sampling method, which causes the model to fail to learn?

Is the data preprocessing code open source?

Hi there. I noticed that the processed data, that is, the data of the triples represented by entity and relation id, already exists in this repo, which can be conveniently used to evaluate the performance of the model.
However, if I want to do further exploration, I would like to know about the data preprocessing, and may I ask if the code that transfers the original data to this processed data could be open source?
Thank you for your contribution to the community.

Test sets missing

In the paper, it is mentioned as:
" To test if our model can correctly interpret negative links, we add the same amount of negative links as existing relation facts into the test sets."
Where are these test sets or would you share how you produce them?
I couldn't reproduce any of your results unfortunately.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.