Giter VIP home page Giter VIP logo

inferbeddings's People

Contributors

pminervini avatar riedelcastro avatar rockt avatar tdmeeste avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

inferbeddings's Issues

arXiv submission

There are a few things that still need to be done, that don't justify another paper, but would round things up and could be helpful in case of rejection.

What are these? For example, running on more datasets, and maybe more models.

More faithful implications

One potential problem with the current formulation is that the implications/clauses losses aren't fully consistent with their logical counterparts. In logic a clause is automatically true when the body is false. In our formulation, a body can have a very low score (effectively false), but the head score still needs to be bigger than the body score. For example, if body score==-15 and head score=-20 the model would still need to push the head score hard, but at this point both expressions are essentially false.

Notice that this reasoning only really make sense with standard loss, not with pairwise loss. The problem with pairwise loss is that one looses any sense of "trueness"...

Compare with KALE

At the moment we do not have any comparison/result with results published in other papers.

A solution is to compare directly against KALE, after using their training/validation/test sets: http://www.aclweb.org/anthology/D/D16/D16-1019.pdf

Their model is based on TransE: with a model based on ComplEx we should be able to show some improvements.

Integrate Sampled Grounding as alternative

This could be done through a refactoring of the code in which the "optimise adversarial" step can be filled in by sampling from the actual entity embeddings. Another view is to enable optimisation from grounded initialisation, and then just do 0 steps of optimisation for old-school grounding (but this isn't quite as elaborate as the sampling scheme in NAACL)

Blog Post

Regardless of whether the UAI submission comes through, I think we should publish the paper on arxiv at some point soon. For this I think it's extremely important to do a bit of PR. I have been thinking about a blog post that

  1. gives a short overview and motivation for KBP
  2. presents neural link predictors
  3. presents our method
  4. shows a simple 2D interactive animation that shows the adversarial learning process (and also the default learning process), using a simple model and problem. If we use a model like distmult, we should be able to relatively easily implement this in javascript. I have some ideas for this based on my talk slides.

We should be careful not to overdo this, and make sure this comes out with the paper. But I do think there is a "gap in the market" regarding nice neural link prediction articles and visualisations, and I think it's doable.

Is there a standard place where people put such posts (besides distill.pub)? We could just upload it to the uclmr webpage I guess.

Experiments (first AMIE+ phase)

Which datasets, rules, settings, hyperparameter ranges, hyperparam. search methods etc. ?

IIRC @tdmeeste was looking into datasets and rules (especially some criterion for generating/selecting the rules to include in the process) - how is it going?

Experiments & Datasets

Note - Experiments on FB15k can be a bit slow (take several hours); e.g. try this:

$ python3 ./bin/adv-cli.py --train data/fb15k/freebase_mtr100_mte100-train.txt --valid data/fb15k/freebase_mtr100_mte100-valid.txt --test data/fb15k/freebase_mtr100_mte100-test.txt --clauses data/fb15k/clauses/clauses_0.999.pl --nb-epochs 100 --lr 0.1 --nb-batches 10 --model TransE --similarity l2 --margin 1 --embedding-size 150 --adv-lr 0.1 --adv-init-ground --adversary-epochs 0 --discriminator-epochs 10 --adv-weight 1000 --adv-batch-size 1

Consider using other datasets, e.g. YAGO or DBpedia.

ICML Paper Title & Abstract

I would like us to tune the high level story of the paper a little. One way is to sell this just as "better rule injection". Another is to think more about "Reasoning with Low Rank Representations", or "Reason and Represent" etc. I think there is a deeper story, in which rule weight learning would fit in well. Let's use this space to hone our story. I will add one version of an abstract title here later.

Adapt Algorithm to new formulation

I copied over the algorithm environment from the old version. Needs adaptation.

This would also include generalise the "entity normalisation" step, which may use unit-ball, unit box, or nothing at all (and instead rely on regularisation). We can also not generalise, and then explicitly say in the text body that other mechanisms are possible but we stick to one for simplicity of exposition.

Decide the best ruleset for FB15k

For generating candidate rule-sets try e.g.

$ ./tools/amie-to-clauses.py -t 0.9 data/fb15k/rules/fb15k-rules_mins=1000_minis=1000.txt

Closed form adversaries - next steps

Next steps:

  • add a first section to closed_form.text with a table containing all currently written out closed forms.
  • implement some of the closed form expressions
    • First: DistMult, unit box, simple implications (than we can redo the EMNLP experiments with lifted loss)
    • Simple implications, with unit ball restriction: we expect that will work less good. Maybe we won't have to derive all of the more complicated unit ball formulas.
    • Then: all others. Especially to try them out on synthetic data as an extra test besides verifying the maths. Let's do this soon!
  • write out remaining closed form expressions
    • alternative ComplEx case, as in the other paper
    • implications with conjunctions
    • symmetry clauses
    • implications with inversion of arguments
    • Transitivity clause with a single relation
    • ComplEx, general transitive clause

Visualisations of Adversarial Inference Mechanism

I think there are some ways to visualise, in the paper, the way the method works. We should design one. Here is one option:

  • Consider link prediction with a single relation r, and some small number of entities
  • Inject the transitivity formula
  • After training the discriminator, project entities to 2D space. Plot edges between them if the discriminator thinks they are related.
  • After training the generator, plot the 3 points it found to violate the transitivity clause the most, again in the 2D graph. Ideally they lie somewhere close to real entities that violate the clause
  • Iterate

We can use this space to discuss other ideas.

Learning "Rule Weights"

I believe that our framework can be relatively easily extended to learn rule weights. I feel this is a low-hanging fruit, and may lead to better results without having to worry about where to get the rules from. If @tdmeeste or @rockt have any cycles, maybe something to look at for them. If we have the datasets already prepared, it's a matter of extending the TF loss. I have some ideas how the loss would look like. Maybe I find some time to hack this in as well.

Generally, I am looking low-hanging fruits that add heft to the paper and make us less relying on improvements by rule injection (which may or not may materialise).

Todo:

  • adapt syntax for clause parser to define rule weights and learnable rule weights
  • implement weighted loss (and negated weighted loss)
  • provide way to easily print out weights per clause (might need dictionary from clauses to variables)
  • run on synthetic dataset with partial transitivity to validate whether a non-0.5 weight is learned.
  • "dynamic weights" based on relation representations

Collect Hypotheses to Test

For the paper and the experiment section, it would be good to be precise about the hypotheses we like to test, and how to test them. Here is a start:

  • Adversarial learning is more efficient than random sampling (NAACL). Tests:
    • lower ground rule violation after same amount of training time (or less time for same ground rule violation counts), ideally on real and synthetic datasets
    • better accuracy after same time (this test somehow conflates things a little)
  • Adversarial learning is more general than EMNLP approach ...
  • and generality is useful in practice
    • Test: show some improvements using types of formulae not supported by EMNLP, ideally over SOTA but at least for ZSL
  • Adversarial learning for rules works: by finding "synthetic" violators, and pushing them down, real violators disappear (presumably because they are similar to the synthetic violators)

Feel free to comment, edit and add more...

With DistMult or ComplEx, when enforcing p => q results in emb(p) ~ emb(q) and 50% ground errors

Here's the code for replicating the issue:

$ ./bin/adv-cli.py --train data/synth/simple-tiny/data.tsv --lr 0.1 --model DistMult --similarity dot --margin 1 --embedding-size 30 --nb-epochs 1000 --clauses data/synth/simple-tiny/clauses.pl --adv-lr 0.1 --adv-ground-samples 100 --adv-weight 1000000 --adversary-epochs 10 --discriminator-epochs 1 --debug

Here's the embeddings of p and q after some epochs (if you run the code, the output is a colored Hinton diagram):

┌──────────────────────────────────────────────────────────────────────────────────────────┐
│ ▇  ▁  ▅     ▂  ▂  ▂  ▁  ▄        █  ▂  ▅  ▇  ▃  ▂     ▂  ▄  ▃  ▃  ▆  ▇  ▇  ▁  ▃  ▄  █  ▁ │
│ ▇  ▁  ▅     ▂  ▃  ▂  ▁  ▄     ▁  █  ▂  ▅  ▇  ▃  ▂     ▂  ▄  ▃  ▃  ▆  ▇  ▇  ▂  ▃  ▄  █  ▁ │
└──────────────────────────────────────────────────────────────────────────────────────────┘

Using TransE results in ~0% ground errors, for some reason.

Other Generator Distributions

Currently, we use only point mass distributions for the generator. This means there is a bit of a disconnect between what we do and the more typical GAN applications. That's completely fine. However, to make this connection stronger in the paper, I'd recommend to try some more standard GAN approaches as well. They don't have to work better, so this is relatively fail-safe, we just want to compare.

Results (08/02/2017)

Some early results are available here:

http://data.neuralnoise.com/inferbeddings/logs_08022017.tar.gz

Just decompress the file in the inferbeddings directory.

Those results are generated by jobs on the UCLCS cluster - the scripts generating the jobs have a UCL_ prefix and are available here:

https://github.com/uclmr/inferbeddings/tree/master/scripts/wn18
https://github.com/uclmr/inferbeddings/tree/master/scripts/fb15k

For checking the results - I've done a script that:

  • Looks for the best hyperameter settings for each metric (filtered setting, like in the ComplEx paper) on the validation set, and
  • Reports the corresponding results on the test sets.

For example - results on WN18 with and without including rules:

  • With rules:
$ ./tools/parse_results_filtered.sh logs/ucl_wn18_adv_v1/*.log
1080
Best MR, Filt: logs/ucl_wn18_adv_v1/ucl_wn18_adv_v1.adv_batch_size=1_adv_epochs=1_adv_lr=0.1_adv_weight=100_batches=10_disc_epochs=10_embedding_size=200_epochs=100_lr=0.1_margin=1_model=TransE_optimizer=adagrad_similarity=l2.log
Test - Best Filt MR: 140.9154

Best MRR, Filt: logs/ucl_wn18_adv_v1/ucl_wn18_adv_v1.adv_batch_size=1_adv_epochs=10_adv_lr=0.1_adv_weight=10000_batches=10_disc_epochs=10_embedding_size=100_epochs=100_lr=0.1_margin=1_model=TransE_optimizer=adagrad_similarity=l1.log
Test - Best Filt MRR: 0.493

Best H@1, Filt: logs/ucl_wn18_adv_v1/ucl_wn18_adv_v1.adv_batch_size=1_adv_epochs=10_adv_lr=0.1_adv_weight=10000_batches=10_disc_epochs=10_embedding_size=100_epochs=100_lr=0.1_margin=1_model=TransE_optimizer=adagrad_similarity=l1.log
Test - Best Filt Hits@1: 32.78%

Best H@3, Filt: logs/ucl_wn18_adv_v1/ucl_wn18_adv_v1.adv_batch_size=10_adv_epochs=10_adv_lr=0.1_adv_weight=100_batches=10_disc_epochs=10_embedding_size=50_epochs=100_lr=0.1_margin=1_model=TransE_optimizer=adagrad_similarity=l1.log
Test - Best Filt Hits@3: 84.57%

Best H@5, Filt: logs/ucl_wn18_adv_v1/ucl_wn18_adv_v1.adv_batch_size=10_adv_epochs=10_adv_lr=0.1_adv_weight=100_batches=10_disc_epochs=10_embedding_size=50_epochs=100_lr=0.1_margin=1_model=TransE_optimizer=adagrad_similarity=l1.log
Test - Best Filt Hits@5: 90.78%

Best H@10, Filt: logs/ucl_wn18_adv_v1/ucl_wn18_adv_v1.adv_batch_size=10_adv_epochs=10_adv_lr=0.1_adv_weight=100_batches=10_disc_epochs=10_embedding_size=50_epochs=100_lr=0.1_margin=1_model=TransE_optimizer=adagrad_similarity=l1.log
Test - Best Filt Hits@10: 93.06%

Without rules:

$ ./tools/parse_results_filtered.sh logs/ucl_wn18_adv_v1/*_adv_weight=0_*.log180
Best MR, Filt: logs/ucl_wn18_adv_v1/ucl_wn18_adv_v1.adv_batch_size=1_adv_epochs=0_adv_lr=0.1_adv_weight=0_batches=10_disc_epochs=10_embedding_size=200_epochs=100_lr=0.1_margin=1_model=TransE_optimizer=adagrad_similarity=l2.log
Test - Best Filt MR: 146.8016

Best MRR, Filt: logs/ucl_wn18_adv_v1/ucl_wn18_adv_v1.adv_batch_size=100_adv_epochs=1_adv_lr=0.1_adv_weight=0_batches=10_disc_epochs=10_embedding_size=50_epochs=100_lr=0.1_margin=1_model=TransE_optimizer=adagrad_similarity=l1.log
Test - Best Filt MRR: 0.372

Best H@1, Filt: logs/ucl_wn18_adv_v1/ucl_wn18_adv_v1.adv_batch_size=100_adv_epochs=1_adv_lr=0.1_adv_weight=0_batches=10_disc_epochs=1_embedding_size=20_epochs=100_lr=0.1_margin=1_model=TransE_optimizer=adagrad_similarity=l2.log
Test - Best Filt Hits@1: 16.62%

Best H@3, Filt: logs/ucl_wn18_adv_v1/ucl_wn18_adv_v1.adv_batch_size=100_adv_epochs=1_adv_lr=0.1_adv_weight=0_batches=10_disc_epochs=10_embedding_size=50_epochs=100_lr=0.1_margin=1_model=TransE_optimizer=adagrad_similarity=l1.log
Test - Best Filt Hits@3: 60.31%

Best H@5, Filt: logs/ucl_wn18_adv_v1/ucl_wn18_adv_v1.adv_batch_size=100_adv_epochs=1_adv_lr=0.1_adv_weight=0_batches=10_disc_epochs=10_embedding_size=50_epochs=100_lr=0.1_margin=1_model=TransE_optimizer=adagrad_similarity=l1.log
Test - Best Filt Hits@5: 70.16%

Best H@10, Filt: logs/ucl_wn18_adv_v1/ucl_wn18_adv_v1.adv_batch_size=1_adv_epochs=1_adv_lr=0.1_adv_weight=0_batches=10_disc_epochs=10_embedding_size=50_epochs=100_lr=0.1_margin=1_model=TransE_optimizer=adagrad_similarity=l1.log
Test - Best Filt Hits@10: 79.39%

Please note that the experiments in logs/ucl_fb15k_adv_v?.2 are still running (and most logfiles are incomplete). Those are experiments with a new ruleset I'm trying for FB15k - using clauses with higher support (minimum support here is 1000 instead of 100): this is related to #11

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.