• Stars
    star
    353
  • Rank 120,322 (Top 3 %)
  • Language
    Jupyter Notebook
  • License
    MIT License
  • Created almost 5 years ago
  • Updated about 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Knowledge Graph Toolkit

KGTK: Knowledge Graph Toolkit

doi travis ci Coverage Status

The Knowledge Graph Toolkit (KGTK) is a comprehensive framework for the creation and exploitation of large hyper-relational knowledge graphs (KGs), designed for ease of use, scalability, and speed. KGTK represents KGs in tab-separated (TSV) files with four columns: edge-identifier, head, edge-label, and tail. All KGTK commands consume and produce KGs represented in this simple format, so they can be composed into pipelines to perform complex transformations on KGs. KGTK provides:

  • a suite of import commands to import Wikidata, RDF and popular graph representations into KGTK format;
  • a rich collection of transformation commands make it easy to clean, union, filter, and sort KGs;
  • graph combination commands support efficient intersection, subtraction, and joining of large KGs;
  • a query language using a variant of Cypher, optimized for querying KGs stored on disk supports efficient ad hoc queries;
  • graph analytics commands support scalable computation of centrality metrics such as PageRank, degrees, connected components and shortest paths;
  • advanced commands support lexicalization of graph nodes, and computation of multiple variants of text and graph embeddings over the whole graph;
  • a suite of export commands supports the transformation of KGTK KGs into commonly used formats, including the Wikidata JSON format, RDF triples, JSON documents for ElasticSearch indexing and graph-tool;
  • a development environment using Jupyter notebooks provides seamless integration with Pandas.

KGTK can process Wikidata-sized KGs with billions of edges on a laptop. We have used KGTK in multiple use cases, focusing primarily on construction of subgraphs of Wikidata, analysis of over 300 Wikidata dumps since the inception of the Wikidata project, linking tables to Wikidata, construction of a commonsense KG combining multiple existing sources, creation of Wikidata extensions for food security and the pharmaceutical industry.

KGTK is open source software, well documented, actively used and developed, and released using the MIT license. We invite the community to try KGTK. It is easy to get started with our tutorial notebooks available and executable online.

Installation

The following instructions install KGTK and the KGTK Jupyter Notebooks on Linux and MacOS systems.

If you want to install KGTK on a Microsoft Windows system, please
contact the KGTK team.

Our KGTK installations use a Conda virtual environment. If you don't have the Conda tools installed, follow this guide to install it. We recommend installing Miniconda installation rather than the full Anaconda installation.

Next, execute the following steps to install the latest stable release of KGTK:

conda create -n kgtk-env python=3.9
conda activate kgtk-env
conda install -c conda-forge graph-tool
conda install -c conda-forge jupyterlab
pip --no-cache install -U kgtk

Please see our installation document for more details. If you encounter problems with your installation, or are interested in a detailed explanation of these commands, read more about the installation procedure here.

Installation issues on Macbooks with M1 chip

Running pip install -e . (development mode) throws an error about 3 libraries,

  1. thinc
  2. blis
  3. tokenizers

Fixed the thinc issue by ,

a. commenting out [this line in requirements.txt](https://github.com/usc-isi-i2/kgtk/blob/dev/requirements.txt#L11)

b. running `pip install thinc-apple-ops`

Fixed the tokenizers issue by running the following commands in the conda environment

# download and install Rust. Follow the on screen instructions

curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
source "$HOME/.cargo/env"

git clone https://github.com/huggingface/tokenizers
cd tokenizers/bindings/python/
pip install setuptools_rust
python setup.py install

continue installing kgtk, pip install -e .

Installing KGTK with Docker

Please refer to this document for installing KGTK with Docker

Getting started

Online Documentation

You can read our latest documentation online with:

https://kgtk.readthedocs.io/en/latest/

KGTK Notebooks

For examples of using KGTK, please see our Tutorial Notebooks.

Releases

KGTK Text Search API

The documentation for the KGTK Text Search API is here

KGTK Semantic Similarity API

The documentation for the KGTK Semantic Similarity API is here

How to cite

@inproceedings{ilievski2020kgtk,
  title={{KGTK}: A Toolkit for Large Knowledge Graph Manipulation and Analysis}},
  author={Ilievski, Filip and Garijo, Daniel and Chalupsky, Hans and Divvala, Naren Teja and Yao, Yixiang and Rogers, Craig and Li, Ronpeng and Liu, Jun and Singh, Amandeep and Schwabe, Daniel and Szekely, Pedro},
  booktitle={International Semantic Web Conference},
  pages={278--293},
  year={2020},
  organization={Springer}
  url={https://arxiv.org/pdf/2006.00088.pdf}
}

More Repositories

1

Web-Karma

Information Integration Tool
Java
586
star
2

ontology-visualization

A simple ontology and RDF visualization tool.
Python
124
star
3

cskg

CSKG: The CommonSense Knowledge Graph
Jupyter Notebook
113
star
4

rltk

Record Linkage ToolKit (Find and link entities)
Python
105
star
5

dig-etl-engine

Download DIG to run on your laptop or server.
101
star
6

etk

Extraction Toolkit
HTML
81
star
7

kgtk-notebooks

Tutorial and hands-on notebook on using the Knowledge Graph Toolkit (KGTK)
Jupyter Notebook
78
star
8

kgtk-similarity

Python
27
star
9

isi-tkg-icl

Temporal Knowledge Graph Forecasting Using In-Context Learning (EMNLP 2023)
Python
22
star
10

festival-text-to-speech-service

REST service to call the Festival text to speech application
C++
22
star
11

t2wml

Table to Wikidata Mapping Language
TypeScript
22
star
12

table-linker

Table Linker
Python
21
star
13

szeke

Information Integration Tool
Java
18
star
14

usc-isi-i2.github.io

Website for USC ISI information integration group
HTML
17
star
15

dig-lsh-clustering

Clustering documents based on LSH
Python
14
star
16

saam-lod

Linked Data mapping for Smithsonian American Art Museum
Web Ontology Language
12
star
17

pyrallel

Yet another easy-to-use python3 parallel library for humans.
Python
12
star
18

gaia-knowledge-graph

Tools to build knowledge graphs from multi-modal extractions
Python
11
star
19

dsbox-ta2

The DSBox TA2 component
Python
11
star
20

dig-elasticsearch

Code to process datasets for elastic search
Java
10
star
21

logical-fallacy-identification

Jupyter Notebook
10
star
22

linked-maps

Framework to build linked spatio-temporal data from vectorized evolutionary topographic map archives
Python
10
star
23

dig-dictionary-extraction

Implements dictionary-based entity extraction as described in the FAERIE paper http://dbgroup.cs.tsinghua.edu.cn/dd/papers/sigmod2011-faerie.pdf
C
9
star
24

graph-keyword-search

Keyword query search engine on semantic store/linked data web
Python
9
star
25

d-repr

Dataset Representation Language for Reading Heterogeneous Datasets to RDF or JSON
Rust
9
star
26

datamart

Data augment
Jupyter Notebook
8
star
27

DSCI-510-Fall-2024

7
star
28

dig-text-similarity-search

Julia
7
star
29

karma-step-by-step

Step by step tutorial to learn Karma
Python
7
star
30

kgtk-browser

Python
7
star
31

ppjoin

PPJoin and P4Join Python 3 implementation
Python
6
star
32

sand

Semantic ANotation of tabular Data
TypeScript
6
star
33

eswc-2015-semantic-typing

Repo for paper, data and software to run the experiments
Java
6
star
34

dsbox-cleaning

The data cleaning TA1 component of DSBox
Python
6
star
35

social-media-meme-identification

Jupyter Notebook
6
star
36

CKG-COVID-19

Jupyter Notebook
5
star
37

dig-stylometry

Python
5
star
38

dsbox-profiling

The data profiling TA1 component of DSBox
Python
5
star
39

bsl

Blocking Scheme Learner
Java
5
star
40

hybrid-jaccard

Implementation of hybrid jaccard similarity
Python
5
star
41

mowgli-in-the-jungle

'mowgli-in-the-jungle' framework for development of solutions on several Machine commonsense datasets.
Python
4
star
42

wikidata-wikifier

Python
4
star
43

wikidata-fuzzy-search

TypeScript
4
star
44

dig-sandpaper

Python
4
star
45

analogical-transfer-learning

Jupyter Notebook
4
star
46

isi-table-understanding

A framework for implementing table understanding systems
4
star
47

wd-quality

Notebooks for generating and validating constraints in WIkidata
Shell
4
star
48

record-linkage-learning

Record Linkage Project, learning FRIL configurations
Java
3
star
49

dig-prep

repository for java + python code for preparing data sets and data delivery tools
Python
3
star
50

kgtk-search

Jupyter Notebook
3
star
51

isi-pubgraph

Repository with tools to create and analyze a knowledge graph about research papers, publication venues, authors, and institutions.
Python
3
star
52

dig-alignment

Code to do feature alignment in dig
PHP
3
star
53

sparql-jsonld

Python
2
star
54

rltk-experimentation

Example, test and benchmark for RLTK (v2)
Python
2
star
55

GRAMS

Python
2
star
56

mydig-webservice

HTML
2
star
57

dig-dictionaries

Useful dictionaries for DIG
Python
2
star
58

dig-tokenizer

Flexible way to tokenize documents in Spark
Python
2
star
59

meme-understanding

Jupyter Notebook
2
star
60

dig-wikifier

Python
2
star
61

wd-similarity

Jupyter Notebook
2
star
62

dig-crf

CRF++ extraction for DIG
Python
2
star
63

kgtk-at-2021-wikidata-workshop

Code and datasets for the KGTK demo at the 2021 Wikidata Workshop at ISWC
Jupyter Notebook
2
star
64

KarmaSpatialClustering

The code includes clustering algorithm of spatial data and related data pre-processing, visualization and file operation, etc.
Python
2
star
65

dig-visualization

JavaScript
2
star
66

dig-sparkutil

python utilities for incorporating DIG components into Spark workflows
Python
1
star
67

karma-information-extraction

Information extraction service for Karma
Java
1
star
68

t2wml-api

backend for t2wml gui
Python
1
star
69

datamart-upload

REST api and upload functions for datamart project
Python
1
star
70

lsh-linking

Testing LSH and MinHash for doing record linkage and deduplication
Python
1
star
71

image-metadata-enhancement

HTML
1
star
72

Social-Viz

JavaScript
1
star
73

gaia-ta2pipeline

Jupyter Notebook
1
star
74

eidos

HTML
1
star
75

dig-age-extractor

Python
1
star
76

datamart-frontend

GUI for datamart project
JavaScript
1
star
77

mint-data-catalog-public

Public MInt Data Catalog
Jupyter Notebook
1
star
78

dig-url-extractor

Python
1
star
79

dig-extract

python-based repository for DIG extractors
Python
1
star
80

pper-criminal-justice

Jupyter Notebook
1
star
81

dig-tokenizer-extractor

Python
1
star
82

datamart-api-notebook

Jupyter notebook demonstrating the capabilities of ISI Datamart using REST API
Jupyter Notebook
1
star
83

dsbox-ta2-system

Dockerfile
1
star
84

SemanticLabelingAlgorithm

Julia
1
star
85

karma-visualization

Examples, models and test code for creating visualizations within Karma
JavaScript
1
star
86

SemanticLabelingService

Python
1
star
87

datamart-userend

ISI datamart implementation for users
Python
1
star
88

minmod-webapp

Java
1
star
89

wikibaseTools

A package that accelerates domain-specific wikibase instance setup, as well as maintains maximal compatibility with wikidata.
Python
1
star
90

dig-entity-merger

Python
1
star
91

datamart-api

Jupyter Notebook
1
star