• Stars
    star
    105
  • Rank 328,196 (Top 7 %)
  • Language
    Python
  • License
    MIT License
  • Created almost 8 years ago
  • Updated over 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Record Linkage ToolKit (Find and link entities)

RLTK: Record Linkage ToolKit

License Github actions Coveralls pypi Documents

The Record Linkage ToolKit (RLTK) is a general-purpose open-source record linkage platform that allows users to build powerful Python programs that link records referring to the same underlying entity. Record linkage is an extremely important problem that shows up in domains extending from social networks to bibliographic data and biomedicine. Current open platforms for record linkage have problems scaling even to moderately sized datasets, or are just not easy to use (even by experts). RLTK attempts to address all of these issues.

RLTK supports a full, scalable record linkage pipeline, including multi-core algorithms for blocking, profiling data, computing a wide variety of features, and training and applying machine learning classifiers based on Pythonโ€™s sklearn library. An end-to-end RLTK pipeline can be jump-started with only a few lines of code. However, RLTK is also designed to be extensible and customizable, allowing users arbitrary degrees of control over many of the individual components. You can add new features to RLTK (e.g. a custom string similarity) very easily.

RLTK is being built by the Center on Knowledge Graphs at USC/ISI, with funding from multiple projects funded by the DARPA LORELEI and MEMEX programs and the IARPA CAUSE program. RLTK is under active maintenance and we expect to keep adding new features and state-of-the-art record linkage algorithms in the foreseeable future, in addition to continuously supporting our adopters to integrate the platform into their applications.

Getting Started

Installation (make sure prerequisites are installed):

pip install -U rltk

Example:

>>> import rltk
>>> rltk.levenshtein_distance('abc', 'abd')
1

Try RLTK Online

Datasets & Experiments

Documentation

More Repositories

1

Web-Karma

Information Integration Tool
Java
586
star
2

kgtk

Knowledge Graph Toolkit
Jupyter Notebook
353
star
3

ontology-visualization

A simple ontology and RDF visualization tool.
Python
124
star
4

cskg

CSKG: The CommonSense Knowledge Graph
Jupyter Notebook
113
star
5

dig-etl-engine

Download DIG to run on your laptop or server.
101
star
6

etk

Extraction Toolkit
HTML
81
star
7

kgtk-notebooks

Tutorial and hands-on notebook on using the Knowledge Graph Toolkit (KGTK)
Jupyter Notebook
78
star
8

kgtk-similarity

Python
27
star
9

isi-tkg-icl

Temporal Knowledge Graph Forecasting Using In-Context Learning (EMNLP 2023)
Python
22
star
10

festival-text-to-speech-service

REST service to call the Festival text to speech application
C++
22
star
11

t2wml

Table to Wikidata Mapping Language
TypeScript
22
star
12

table-linker

Table Linker
Python
21
star
13

szeke

Information Integration Tool
Java
18
star
14

usc-isi-i2.github.io

Website for USC ISI information integration group
HTML
17
star
15

dig-lsh-clustering

Clustering documents based on LSH
Python
14
star
16

saam-lod

Linked Data mapping for Smithsonian American Art Museum
Web Ontology Language
12
star
17

pyrallel

Yet another easy-to-use python3 parallel library for humans.
Python
12
star
18

gaia-knowledge-graph

Tools to build knowledge graphs from multi-modal extractions
Python
11
star
19

dsbox-ta2

The DSBox TA2 component
Python
11
star
20

dig-elasticsearch

Code to process datasets for elastic search
Java
10
star
21

logical-fallacy-identification

Jupyter Notebook
10
star
22

linked-maps

Framework to build linked spatio-temporal data from vectorized evolutionary topographic map archives
Python
10
star
23

dig-dictionary-extraction

Implements dictionary-based entity extraction as described in the FAERIE paper http://dbgroup.cs.tsinghua.edu.cn/dd/papers/sigmod2011-faerie.pdf
C
9
star
24

graph-keyword-search

Keyword query search engine on semantic store/linked data web
Python
9
star
25

d-repr

Dataset Representation Language for Reading Heterogeneous Datasets to RDF or JSON
Rust
9
star
26

datamart

Data augment
Jupyter Notebook
8
star
27

DSCI-510-Fall-2024

7
star
28

dig-text-similarity-search

Julia
7
star
29

karma-step-by-step

Step by step tutorial to learn Karma
Python
7
star
30

kgtk-browser

Python
7
star
31

ppjoin

PPJoin and P4Join Python 3 implementation
Python
6
star
32

sand

Semantic ANotation of tabular Data
TypeScript
6
star
33

eswc-2015-semantic-typing

Repo for paper, data and software to run the experiments
Java
6
star
34

dsbox-cleaning

The data cleaning TA1 component of DSBox
Python
6
star
35

social-media-meme-identification

Jupyter Notebook
6
star
36

CKG-COVID-19

Jupyter Notebook
5
star
37

dig-stylometry

Python
5
star
38

dsbox-profiling

The data profiling TA1 component of DSBox
Python
5
star
39

bsl

Blocking Scheme Learner
Java
5
star
40

hybrid-jaccard

Implementation of hybrid jaccard similarity
Python
5
star
41

mowgli-in-the-jungle

'mowgli-in-the-jungle' framework for development of solutions on several Machine commonsense datasets.
Python
4
star
42

wikidata-wikifier

Python
4
star
43

wikidata-fuzzy-search

TypeScript
4
star
44

dig-sandpaper

Python
4
star
45

analogical-transfer-learning

Jupyter Notebook
4
star
46

isi-table-understanding

A framework for implementing table understanding systems
4
star
47

wd-quality

Notebooks for generating and validating constraints in WIkidata
Shell
4
star
48

record-linkage-learning

Record Linkage Project, learning FRIL configurations
Java
3
star
49

dig-prep

repository for java + python code for preparing data sets and data delivery tools
Python
3
star
50

kgtk-search

Jupyter Notebook
3
star
51

isi-pubgraph

Repository with tools to create and analyze a knowledge graph about research papers, publication venues, authors, and institutions.
Python
3
star
52

dig-alignment

Code to do feature alignment in dig
PHP
3
star
53

sparql-jsonld

Python
2
star
54

rltk-experimentation

Example, test and benchmark for RLTK (v2)
Python
2
star
55

GRAMS

Python
2
star
56

mydig-webservice

HTML
2
star
57

dig-dictionaries

Useful dictionaries for DIG
Python
2
star
58

dig-tokenizer

Flexible way to tokenize documents in Spark
Python
2
star
59

meme-understanding

Jupyter Notebook
2
star
60

dig-wikifier

Python
2
star
61

wd-similarity

Jupyter Notebook
2
star
62

dig-crf

CRF++ extraction for DIG
Python
2
star
63

kgtk-at-2021-wikidata-workshop

Code and datasets for the KGTK demo at the 2021 Wikidata Workshop at ISWC
Jupyter Notebook
2
star
64

KarmaSpatialClustering

The code includes clustering algorithm of spatial data and related data pre-processing, visualization and file operation, etc.
Python
2
star
65

dig-visualization

JavaScript
2
star
66

dig-sparkutil

python utilities for incorporating DIG components into Spark workflows
Python
1
star
67

karma-information-extraction

Information extraction service for Karma
Java
1
star
68

t2wml-api

backend for t2wml gui
Python
1
star
69

datamart-upload

REST api and upload functions for datamart project
Python
1
star
70

lsh-linking

Testing LSH and MinHash for doing record linkage and deduplication
Python
1
star
71

image-metadata-enhancement

HTML
1
star
72

Social-Viz

JavaScript
1
star
73

gaia-ta2pipeline

Jupyter Notebook
1
star
74

eidos

HTML
1
star
75

dig-age-extractor

Python
1
star
76

datamart-frontend

GUI for datamart project
JavaScript
1
star
77

mint-data-catalog-public

Public MInt Data Catalog
Jupyter Notebook
1
star
78

dig-url-extractor

Python
1
star
79

dig-extract

python-based repository for DIG extractors
Python
1
star
80

pper-criminal-justice

Jupyter Notebook
1
star
81

dig-tokenizer-extractor

Python
1
star
82

datamart-api-notebook

Jupyter notebook demonstrating the capabilities of ISI Datamart using REST API
Jupyter Notebook
1
star
83

dsbox-ta2-system

Dockerfile
1
star
84

SemanticLabelingAlgorithm

Julia
1
star
85

karma-visualization

Examples, models and test code for creating visualizations within Karma
JavaScript
1
star
86

SemanticLabelingService

Python
1
star
87

datamart-userend

ISI datamart implementation for users
Python
1
star
88

minmod-webapp

Java
1
star
89

wikibaseTools

A package that accelerates domain-specific wikibase instance setup, as well as maintains maximal compatibility with wikidata.
Python
1
star
90

dig-entity-merger

Python
1
star
91

datamart-api

Jupyter Notebook
1
star