Giter VIP home page Giter VIP logo

cqd's Introduction

Continuous Query Decomposition


PWC PWC PWC

Updates

  • In an extended abstract for IJCAI, we have included additional results on the updated query answering datasets introduced with BetaE. These results are also now on paperswithcode.com!
  • We implemented CQD in the KGReasoning framework, a library from SNAP implementing several Complex Query Answering models, which also supports experimenting with the Query2Box and BetaE datasets (in this repo, we only consider the former). Our implementation is available at this link.

This repository contains the official implementation for our ICLR 2021 (Oral, Outstanding Paper Award) paper, Complex Query Answering with Neural Link Predictors:

@inproceedings{
    arakelyan2021complex,
    title={Complex Query Answering with Neural Link Predictors},
    author={Erik Arakelyan and Daniel Daza and Pasquale Minervini and Michael Cochez},
    booktitle={International Conference on Learning Representations},
    year={2021},
    url={https://openreview.net/forum?id=Mos9F9kDwkz}
}

In this work we present CQD, a method that reuses a pretrained link predictor to answer complex queries, by scoring atom predicates independently and aggregating the scores via t-norms and t-conorms.

Our code is based on an implementation of ComplEx-N3 available here.

Please follow the instructions next to reproduce the results in our experiments.

1. Install the requirements

We recommend creating a new environment:

% conda create --name cqd python=3.8 && conda activate cqd
% pip install -r requirements.txt

2. Download the data

We use 3 knowledge graphs: FB15k, FB15k-237, and NELL. From the root of the repository, download and extract the files to obtain the folder data, containing the sets of triples and queries for each graph.

% wget http://data.neuralnoise.com/cqd-data.tgz
% tar xvf cqd-data.tgz

3. Download the models

Then you need neural link prediction models -- one for each of the datasets. Our pre-trained neural link prediction models are available here:

% wget http://data.neuralnoise.com/cqd-models.tgz
% tar xvf cqd-models.tgz

3. Alternative -- Train your own models

To obtain entity and relation embeddings, we use ComplEx. Use the next commands to train the embeddings for each dataset.

FB15k

% python -m kbc.learn data/FB15k --rank 1000 --reg 0.01 --max_epochs 100  --batch_size 100

FB15k-237

% python -m kbc.learn data/FB15k-237 --rank 1000 --reg 0.05 --max_epochs 100  --batch_size 1000

NELL

% python -m kbc.learn data/NELL --rank 1000 --reg 0.05 --max_epochs 100  --batch_size 1000

Once training is done, the models will be saved in the models directory.

4. Answering queries with CQD

CQD can answer complex queries via continuous (CQD-CO) or combinatorial optimisation (CQD-Beam).

CQD-Beam

Use the kbc.cqd_beam script to answer queries, providing the path to the dataset, and the saved link predictor trained in the previous step. For example,

% python -m kbc.cqd_beam --model_path models/[model_filename].pt

Example:

% PYTHONPATH=. python3 kbc/cqd_beam.py \
  --model_path models/FB15k-model-rank-1000-epoch-100-*.pt \
  --dataset FB15K --mode test --t_norm product --candidates 64 \
  --scores_normalize 0 data/FB15k

models/FB15k-model-rank-1000-epoch-100-1602520745.pt FB15k product 64
ComplEx(
  (embeddings): ModuleList(
    (0): Embedding(14951, 2000, sparse=True)
    (1): Embedding(2690, 2000, sparse=True)
  )
)

[..]

This will save a series of JSON fils with results, e.g.

% cat "topk_d=FB15k_t=product_e=2_2_rank=1000_k=64_sn=0.json"
{
  "MRRm_new": 0.7542805715523118,
  "MRm_new": 50.71081983144581,
  "HITS@1m_new": 0.6896709378392843,
  "HITS@3m_new": 0.7955001359095913,
  "HITS@10m_new": 0.8676865172456019
}

CQD-CO

Use the kbc.cqd_co script to answer queries, providing the path to the dataset, and the saved link predictor trained in the previous step. For example,

% python -m kbc.cqd_co data/FB15k --model_path models/[model_filename].pt --chain_type 1_2

Final Results

All results from the paper can be produced as follows:

% cd results/topk
% ../topk-parse.py *.json | grep rank=1000
d=FB15K rank=1000 & 0.779 & 0.584 & 0.796 & 0.837 & 0.377 & 0.658 & 0.839 & 0.355
d=FB237 rank=1000 & 0.279 & 0.219 & 0.352 & 0.457 & 0.129 & 0.249 & 0.284 & 0.128
d=NELL rank=1000 & 0.343 & 0.297 & 0.410 & 0.529 & 0.168 & 0.283 & 0.536 & 0.157
% cd ../cont
% ../cont-parse.py *.json | grep rank=1000
d=FB15k rank=1000 & 0.454 & 0.191 & 0.796 & 0.837 & 0.336 & 0.513 & 0.816 & 0.319
d=FB15k-237 rank=1000 & 0.213 & 0.131 & 0.352 & 0.457 & 0.146 & 0.222 & 0.281 & 0.132
d=NELL rank=1000 & 0.265 & 0.220 & 0.410 & 0.529 & 0.196 & 0.302 & 0.531 & 0.194

Generating explanations

When using CQD-Beam for query answering, we can inspect intermediate decisions. We provide an example implementation for the case of 2p queries over FB15k-237, that generates a log file. To generate this log, add the --explain flag when running the cqd_beam script. The file will be saved as explain.log.

Note: for readability, this requires an extra file mapping FB15k-237 entity identifiers to their original names. Download the file from this link to the data/FB15k-237 path and untar it.

cqd's People

Contributors

dfdazac avatar osoblanco avatar pminervini avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

cqd's Issues

Typo in model download instructions

Hi! I'm trying to reproduce the code following the instructions on the README and I found an error in the instructions.
In the instruction, a command for downloading and decompressing the model is provided. However, the decompressing instruction that follows does not refer to the models' file but to cqd-data. I've found that replacing the second line with

% tar xvf cqd-models.tgz

solves this issue :)

2u/up queries reproduction with CQD @ KGReasoning

Since there is no issue board at https://github.com/pminervini/KGReasoning I thought I could write it here and tag @pminervini πŸ˜ƒ

I'm trying to run CQD CO and Beam on the BetaE version of FB15k-237 and NELL-995 datasets using that repo, but for some reason, the numbers for union queries are very low.

After downloading the pre-trained models (fb15k-237-betae and nell-betae, respectively), I'm using the following commands:

python main.py -cuda --do_test --data_path FB15k-237-betae --cpu_num 1 --geo cqd --tasks "1p.2p.3p.2i.3i.ip.pi.2u.up" --checkpoint_path models/fb15k-237-betae -d 1000
python main.py -cuda --do_test --data_path NELL-betae --cpu_num 1 --geo cqd --tasks "1p.2p.3p.2i.3i.ip.pi.2u.up" --checkpoint_path models/nell-betae -d 1000

Other hyperparams are set as default ones (there is no info on when to put --cqd-sigmoid-scores or --cqd-normalize-scores, so I presume they should be turned off).

The numbers for 2u/up FB15k-237:

Test 2u-DNF MRR at step 99999: 0.005257
Test 2u-DNF HITS1 at step 99999: 0.001895
Test 2u-DNF HITS3 at step 99999: 0.004898
Test 2u-DNF HITS10 at step 99999: 0.010378
est 2u-DNF num_queries at step 99999: 5000.000000
Test up-DNF MRR at step 99999: 0.016857
Test up-DNF HITS1 at step 99999: 0.005590
Test up-DNF HITS3 at step 99999: 0.014344
Test up-DNF HITS10 at step 99999: 0.033338
Test up-DNF num_queries at step 99999: 5000.000000

And for NELL:

Test 2u-DNF MRR at step 99999: 0.007676
Test 2u-DNF HITS1 at step 99999: 0.004144
Test 2u-DNF HITS3 at step 99999: 0.006924
Test 2u-DNF HITS10 at step 99999: 0.014262
Test 2u-DNF num_queries at step 99999: 4000.000000
Test up-DNF MRR at step 99999: 0.023296
Test up-DNF HITS1 at step 99999: 0.010295
Test up-DNF HITS3 at step 99999: 0.022723
Test up-DNF HITS10 at step 99999: 0.045247
Test up-DNF num_queries at step 99999: 4000.000000

Is there anything missing or those are expected numbers for betae datasets?

P.S. Would be good to have an example of how to properly run CQD with KGReasoning in the example.sh :)

Answering queries with CQD Error

When I have trained the model, I followed the the path to answer the question but it showed that missing the entity2text.text .Could you please give some suggestions to slove the problems? thx!😊

image

The procedure of the t-norms and neural link prediction

I sorry to say that I could't really understand the procedure of the t-norms and neural link prediction when I studying the codes of this moduleπŸ˜” ,could you give some pseudo codes, math formulas or illustrations about this module. thx!😘 Looking forward for your reply😊

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.