• Stars
    star
    274
  • Rank 150,274 (Top 3 %)
  • Language
    Python
  • License
    MIT License
  • Created over 5 years ago
  • Updated over 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

A batteries-included kit for knowledge graphs

CircleCI DOI Documentation Status

Hello!

The tech behind parts of ZincBase was acquired. This repo is still here for reference, but it is deprecated.

Fortunately, work still goes on. Apart from a couple of fringe bits, the active repo lives here.

The new owner of ZincBase as it is today is ComplexDB.

Alright, you still want to continue

Zincbase logo

ZincBase is a state of the art knowledge base. It does the following:

  • Extract facts (aka triples and rules) from unstructured data/text
  • Store and retrieve those facts efficiently
  • Build them into a graph
  • Provide ways to query the graph, including via bleeding-edge graph neural networks.

Zincbase exists to answer questions like "what is the probability that Tom likes LARPing", or "who likes LARPing", or "classify people into LARPers vs normies":

Example graph for reasoning

It combines the latest in neural networks with symbolic logic (think expert systems and prolog) and graph search.

View full documentation here.

Quickstart

from zincbase import KB
kb = KB()
kb.store('eats(tom, rice)')
for ans in kb.query('eats(tom, Food)'):
    print(ans['Food']) # prints 'rice'

...
# The included assets/countries_s1_train.csv contains triples like:
# (namibia, locatedin, africa)
# (lithuania, neighbor, poland)

kb = KB()
kb.from_csv('./assets/countries.csv')
kb.build_kg_model(cuda=False, embedding_size=40)
kb.train_kg_model(steps=2000, batch_size=1, verbose=False)
kb.estimate_triple_prob('fiji', 'locatedin', 'melanesia')
0.8467

Requirements

  • Python 3
  • Libraries from requirements.txt
  • GPU preferable for large graphs but not required

Installation

pip install -r requirements.txt

Note: Requirements might differ for PyTorch depending on your system.

Testing

python test/test_main.py
python test/test_graph.py
python test/test_lists.py
python test/test_nn_basic.py
python test/test_nn.py
python test/test_neg_examples.py
python test/test_truthiness.py
python -m doctest zincbase/zincbase.py

Validation

"Countries" and "FB15k" datasets are included in this repo.

There is a script to evaluate that ZincBase gets at least as good performance on the Countries dataset as the original (2019) RotatE paper. From the repo's root directory:

python examples/eval_countries_s3.py

It tests the hardest Countries task and prints out the AUC ROC, which should be ~ 0.95 to match the paper. It takes about 30 minutes to run on a modern GPU.

There is also a script to evaluate performance on FB15k: python examples/fb15k_mrr.py.

Building documentation

From docs/ dir: make html. If something changed a lot: sphinx-apidoc -o . ..

TODO

  • Add documentation
  • to_csv method
  • utilize postgres as backend triple store
  • The to_csv/from_csv methods do not yet support node attributes.
  • Add relation extraction from arbitrary unstructured text
  • Add context to triple - that is interpreted by BERT/ULM/GPT-2 similar and put into an embedding that's concat'd to the KG embedding.
  • Reinforcement learning for graph traversal.

References & Acknowledgements

Theo Trouillon. Complex-Valued Embedding Models for Knowledge Graphs. Machine Learning[cs.LG]. Université Grenoble Alpes, 2017. English. ffNNT : 2017GREAM048

L334: Computational Syntax and Semantics -- Introduction to Prolog, Steve Harlow

Open Book Project: Prolog in Python, Chris Meyers

Prolog Interpreter in Javascript

RotatE: Knowledge Graph Embedding by Relational Rotation in Complex Space, Zhiqing Sun and Zhi-Hong Deng and Jian-Yun Nie and Jian Tang, International Conference on Learning Representations, 2019

Citing

If you use this software, please consider citing:

@software{zincbase,
  author = {{Tom Grek}},
  title = {ZincBase: A state of the art knowledge base},
  url = {https://github.com/tomgrek/zincbase},
  version = {0.1.1},
  date = {2019-05-12}
}

Contributing

See CONTRIBUTING. And please do!

More Repositories

1

mlq

Asynchronous queue for machine learning jobs
Python
149
star
2

rnn-robinhood

Automated trading on Robinhood via RNN
Jupyter Notebook
106
star
3

RL-stocktrading

Jupyter Notebook
100
star
4

ml-deployment-demo

ML Deployment, Two Ways
Jupyter Notebook
58
star
5

pyphonic

A VST plugin in Python
Python
20
star
6

pytorch-from-scratch

Accompanies my article PyTorch From Scratch
Jupyter Notebook
19
star
7

robinhood-attention

Predicting Robinhood stocks using attention
Jupyter Notebook
11
star
8

RL-montyhall

Solving the Monty Hall game with Reinforcement Learning
Jupyter Notebook
8
star
9

openiotdash

An open-source, self-hosted IoT dashboard
Vue
6
star
10

python-turtle

Python
5
star
11

hackathon-skeleton

A skeleton (boilerplate) for quickly getting started on hackathon projects, using all modern tech and best practices. Suitable for prototyping and quick deployment.
JavaScript
5
star
12

pandas-cheatsheet

Jupyter Notebook
4
star
13

gameoflife

Conway's Game of Life, in PyTorch, using convolutions
Python
4
star
14

spotify-analysis

Comparing the music of me and my wife
Jupyter Notebook
2
star
15

reddish

Realtime client-side pubsub data with Redis
JavaScript
2
star
16

farmworld

Reinforcement Learning for Agriculture
Python
2
star
17

tomgrek.com

My website
Vue
1
star
18

pytorch-chatbot

Three article series on building and deploying a chatbot with PyTorch
Jupyter Notebook
1
star
19

cssgrid

A Masonry-like grid in pure CSS
HTML
1
star
20

cuke.cool

Jupyter Notebook
1
star
21

test-repoforcomponents

test for open iot dash
1
star
22

slack-gen

Generate new text from your org's Slack
Jupyter Notebook
1
star