Giter VIP home page Giter VIP logo

zincbase's Introduction

CircleCI DOI Documentation Status

Hello!

The tech behind parts of ZincBase was acquired. This repo is still here for reference, but it is deprecated.

Fortunately, work still goes on. Apart from a couple of fringe bits, the active repo lives here.

The new owner of ZincBase as it is today is ComplexDB.

Alright, you still want to continue

Zincbase logo

ZincBase is a state of the art knowledge base. It does the following:

  • Extract facts (aka triples and rules) from unstructured data/text
  • Store and retrieve those facts efficiently
  • Build them into a graph
  • Provide ways to query the graph, including via bleeding-edge graph neural networks.

Zincbase exists to answer questions like "what is the probability that Tom likes LARPing", or "who likes LARPing", or "classify people into LARPers vs normies":

Example graph for reasoning

It combines the latest in neural networks with symbolic logic (think expert systems and prolog) and graph search.

View full documentation here.

Quickstart

from zincbase import KB
kb = KB()
kb.store('eats(tom, rice)')
for ans in kb.query('eats(tom, Food)'):
    print(ans['Food']) # prints 'rice'

...
# The included assets/countries_s1_train.csv contains triples like:
# (namibia, locatedin, africa)
# (lithuania, neighbor, poland)

kb = KB()
kb.from_csv('./assets/countries.csv')
kb.build_kg_model(cuda=False, embedding_size=40)
kb.train_kg_model(steps=2000, batch_size=1, verbose=False)
kb.estimate_triple_prob('fiji', 'locatedin', 'melanesia')
0.8467

Requirements

  • Python 3
  • Libraries from requirements.txt
  • GPU preferable for large graphs but not required

Installation

pip install -r requirements.txt

Note: Requirements might differ for PyTorch depending on your system.

Testing

python test/test_main.py
python test/test_graph.py
python test/test_lists.py
python test/test_nn_basic.py
python test/test_nn.py
python test/test_neg_examples.py
python test/test_truthiness.py
python -m doctest zincbase/zincbase.py

Validation

"Countries" and "FB15k" datasets are included in this repo.

There is a script to evaluate that ZincBase gets at least as good performance on the Countries dataset as the original (2019) RotatE paper. From the repo's root directory:

python examples/eval_countries_s3.py

It tests the hardest Countries task and prints out the AUC ROC, which should be ~ 0.95 to match the paper. It takes about 30 minutes to run on a modern GPU.

There is also a script to evaluate performance on FB15k: python examples/fb15k_mrr.py.

Building documentation

From docs/ dir: make html. If something changed a lot: sphinx-apidoc -o . ..

TODO

  • Add documentation
  • to_csv method
  • utilize postgres as backend triple store
  • The to_csv/from_csv methods do not yet support node attributes.
  • Add relation extraction from arbitrary unstructured text
  • Add context to triple - that is interpreted by BERT/ULM/GPT-2 similar and put into an embedding that's concat'd to the KG embedding.
  • Reinforcement learning for graph traversal.

References & Acknowledgements

Theo Trouillon. Complex-Valued Embedding Models for Knowledge Graphs. Machine Learning[cs.LG]. Université Grenoble Alpes, 2017. English. ffNNT : 2017GREAM048

L334: Computational Syntax and Semantics -- Introduction to Prolog, Steve Harlow

Open Book Project: Prolog in Python, Chris Meyers

Prolog Interpreter in Javascript

RotatE: Knowledge Graph Embedding by Relational Rotation in Complex Space, Zhiqing Sun and Zhi-Hong Deng and Jian-Yun Nie and Jian Tang, International Conference on Learning Representations, 2019

Citing

If you use this software, please consider citing:

@software{zincbase,
  author = {{Tom Grek}},
  title = {ZincBase: A state of the art knowledge base},
  url = {https://github.com/tomgrek/zincbase},
  version = {0.1.1},
  date = {2019-05-12}
}

Contributing

See CONTRIBUTING. And please do!

zincbase's People

Contributors

tomgrek avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

zincbase's Issues

Canada located in Central Asia

Really cool project and loved your podcast!

I was tinkering around with the get_most_likely function as follows:

    kb.from_csv('.//assets//countries_s3_train.csv',delimiter='\t')
    kb.build_kg_model(cuda=False, embedding_size=40)
    kb.train_kg_model(steps=2000, batch_size=1, verbose=True)
    print(kb.get_most_likely('canada', 'locatedin', '?'))

And this results in:
[{'triple': ('canada', 'locatedin', 'central_asia'), 'prob': 0.8538}]

Unless I have my geography wrong :), do you think this is a result of the data being faulty? Or could I have done something wrong?

What dynamic visualisation front end fits well with Zincbase?

Sorry to open this as an issue, but I don't know of a better channel for ZincBase talk.

I am, wondering what is a good visualisation front end for ZincBase knowledge trees? From my understanding ZincBase used underneath networkx, so matplotlib is obviously a choice, but it is not very dynamic and easy to use for browser output.

Does anybody have experience with visualising Zincbase trees in some front ends?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.