• Stars
    star
    104
  • Rank 329,321 (Top 7 %)
  • Language
    Python
  • License
    MIT License
  • Created over 7 years ago
  • Updated about 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Record Linkage ToolKit (Find and link entities)

RLTK: Record Linkage ToolKit

License Github actions Coveralls pypi Documents

The Record Linkage ToolKit (RLTK) is a general-purpose open-source record linkage platform that allows users to build powerful Python programs that link records referring to the same underlying entity. Record linkage is an extremely important problem that shows up in domains extending from social networks to bibliographic data and biomedicine. Current open platforms for record linkage have problems scaling even to moderately sized datasets, or are just not easy to use (even by experts). RLTK attempts to address all of these issues.

RLTK supports a full, scalable record linkage pipeline, including multi-core algorithms for blocking, profiling data, computing a wide variety of features, and training and applying machine learning classifiers based on Pythonโ€™s sklearn library. An end-to-end RLTK pipeline can be jump-started with only a few lines of code. However, RLTK is also designed to be extensible and customizable, allowing users arbitrary degrees of control over many of the individual components. You can add new features to RLTK (e.g. a custom string similarity) very easily.

RLTK is being built by the Center on Knowledge Graphs at USC/ISI, with funding from multiple projects funded by the DARPA LORELEI and MEMEX programs and the IARPA CAUSE program. RLTK is under active maintenance and we expect to keep adding new features and state-of-the-art record linkage algorithms in the foreseeable future, in addition to continuously supporting our adopters to integrate the platform into their applications.

Getting Started

Installation (make sure prerequisites are installed):

pip install -U rltk

Example:

>>> import rltk
>>> rltk.levenshtein_distance('abc', 'abd')
1

Try RLTK Online

Datasets & Experiments

Documentation

More Repositories

1

Web-Karma

Information Integration Tool
Java
585
star
2

kgtk

Knowledge Graph Toolkit
Jupyter Notebook
350
star
3

ontology-visualization

A simple ontology and RDF visualization tool.
Python
122
star
4

cskg

CSKG: The CommonSense Knowledge Graph
Jupyter Notebook
113
star
5

dig-etl-engine

Download DIG to run on your laptop or server.
101
star
6

etk

Extraction Toolkit
HTML
81
star
7

kgtk-notebooks

Tutorial and hands-on notebook on using the Knowledge Graph Toolkit (KGTK)
Jupyter Notebook
79
star
8

kgtk-similarity

Python
26
star
9

festival-text-to-speech-service

REST service to call the Festival text to speech application
C++
22
star
10

t2wml

Table to Wikidata Mapping Language
TypeScript
22
star
11

isi-tkg-icl

Temporal Knowledge Graph Forecasting Using In-Context Learning (EMNLP 2023)
Python
21
star
12

table-linker

Table Linker
Python
21
star
13

szeke

Information Integration Tool
Java
18
star
14

usc-isi-i2.github.io

Website for USC ISI information integration group
HTML
17
star
15

dig-lsh-clustering

Clustering documents based on LSH
Python
14
star
16

saam-lod

Linked Data mapping for Smithsonian American Art Museum
Web Ontology Language
12
star
17

pyrallel

Yet another easy-to-use python3 parallel library for humans.
Python
12
star
18

gaia-knowledge-graph

Tools to build knowledge graphs from multi-modal extractions
Python
11
star
19

dsbox-ta2

The DSBox TA2 component
Python
11
star
20

dig-elasticsearch

Code to process datasets for elastic search
Java
10
star
21

dig-dictionary-extraction

Implements dictionary-based entity extraction as described in the FAERIE paper http://dbgroup.cs.tsinghua.edu.cn/dd/papers/sigmod2011-faerie.pdf
C
9
star
22

graph-keyword-search

Keyword query search engine on semantic store/linked data web
Python
9
star
23

logical-fallacy-identification

Jupyter Notebook
9
star
24

linked-maps

Framework to build linked spatio-temporal data from vectorized evolutionary topographic map archives
Python
9
star
25

datamart

Data augment
Jupyter Notebook
8
star
26

d-repr

Dataset Representation Language for Reading Heterogeneous Datasets to RDF or JSON
Rust
8
star
27

dig-text-similarity-search

Julia
7
star
28

karma-step-by-step

Step by step tutorial to learn Karma
Python
7
star
29

kgtk-browser

Python
7
star
30

ppjoin

PPJoin and P4Join Python 3 implementation
Python
6
star
31

sand

Semantic ANotation of tabular Data
TypeScript
6
star
32

eswc-2015-semantic-typing

Repo for paper, data and software to run the experiments
Java
6
star
33

dsbox-cleaning

The data cleaning TA1 component of DSBox
Python
6
star
34

social-media-meme-identification

Jupyter Notebook
6
star
35

CKG-COVID-19

Jupyter Notebook
5
star
36

dig-stylometry

Python
5
star
37

dsbox-profiling

The data profiling TA1 component of DSBox
Python
5
star
38

bsl

Blocking Scheme Learner
Java
5
star
39

hybrid-jaccard

Implementation of hybrid jaccard similarity
Python
5
star
40

mowgli-in-the-jungle

'mowgli-in-the-jungle' framework for development of solutions on several Machine commonsense datasets.
Python
4
star
41

wikidata-wikifier

Python
4
star
42

wikidata-fuzzy-search

TypeScript
4
star
43

dig-sandpaper

Python
4
star
44

analogical-transfer-learning

Jupyter Notebook
4
star
45

isi-table-understanding

A framework for implementing table understanding systems
4
star
46

wd-quality

Notebooks for generating and validating constraints in WIkidata
Shell
4
star
47

record-linkage-learning

Record Linkage Project, learning FRIL configurations
Java
3
star
48

DSCI-510-Fall-2023

3
star
49

dig-prep

repository for java + python code for preparing data sets and data delivery tools
Python
3
star
50

kgtk-search

Jupyter Notebook
3
star
51

dig-alignment

Code to do feature alignment in dig
PHP
3
star
52

sparql-jsonld

Python
2
star
53

rltk-experimentation

Example, test and benchmark for RLTK (v2)
Python
2
star
54

GRAMS

Python
2
star
55

mydig-webservice

HTML
2
star
56

dig-dictionaries

Useful dictionaries for DIG
Python
2
star
57

dig-tokenizer

Flexible way to tokenize documents in Spark
Python
2
star
58

meme-understanding

Jupyter Notebook
2
star
59

dig-wikifier

Python
2
star
60

wd-similarity

Jupyter Notebook
2
star
61

dig-crf

CRF++ extraction for DIG
Python
2
star
62

kgtk-at-2021-wikidata-workshop

Code and datasets for the KGTK demo at the 2021 Wikidata Workshop at ISWC
Jupyter Notebook
2
star
63

KarmaSpatialClustering

The code includes clustering algorithm of spatial data and related data pre-processing, visualization and file operation, etc.
Python
2
star
64

isi-pubgraph

Repository with tools to create and analyze a knowledge graph about research papers, publication venues, authors, and institutions.
Python
2
star
65

dig-visualization

JavaScript
2
star
66

dig-sparkutil

python utilities for incorporating DIG components into Spark workflows
Python
1
star
67

karma-information-extraction

Information extraction service for Karma
Java
1
star
68

t2wml-api

backend for t2wml gui
Python
1
star
69

datamart-upload

REST api and upload functions for datamart project
Python
1
star
70

lsh-linking

Testing LSH and MinHash for doing record linkage and deduplication
Python
1
star
71

image-metadata-enhancement

HTML
1
star
72

Social-Viz

JavaScript
1
star
73

gaia-ta2pipeline

Jupyter Notebook
1
star
74

eidos

HTML
1
star
75

dig-age-extractor

Python
1
star
76

datamart-frontend

GUI for datamart project
JavaScript
1
star
77

mint-data-catalog-public

Public MInt Data Catalog
Jupyter Notebook
1
star
78

dig-url-extractor

Python
1
star
79

dig-extract

python-based repository for DIG extractors
Python
1
star
80

pper-criminal-justice

Jupyter Notebook
1
star
81

dig-tokenizer-extractor

Python
1
star
82

datamart-api-notebook

Jupyter notebook demonstrating the capabilities of ISI Datamart using REST API
Jupyter Notebook
1
star
83

dsbox-ta2-system

Dockerfile
1
star
84

SemanticLabelingAlgorithm

Julia
1
star
85

karma-visualization

Examples, models and test code for creating visualizations within Karma
JavaScript
1
star
86

datamart-userend

ISI datamart implementation for users
Python
1
star
87

SemanticLabelingService

Python
1
star
88

minmod-webapp

Java
1
star
89

wikibaseTools

A package that accelerates domain-specific wikibase instance setup, as well as maintains maximal compatibility with wikidata.
Python
1
star
90

dig-entity-merger

Python
1
star
91

datamart-api

Jupyter Notebook
1
star