• Stars
    star
    586
  • Rank 76,279 (Top 2 %)
  • Language
    Java
  • License
    Apache License 2.0
  • Created about 13 years ago
  • Updated 8 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Information Integration Tool

Karma: A Data Integration Tool

travis ci


Feedback Request

We value your feedback!

We would greatly appreciate it if you could take a few minutes to provide us with your feedback on our application. Your input will help us improve and enhance the overall user experience.

Please click here to access the feedback form, and share your thoughts, suggestions, or report any issues you encountered.

Thank you for your support!


The Karma tutorial at https://github.com/szeke/karma-tcdl-tutorial, also check out our DIG web site, where we use Karma extensively to process > 90M web pages.

See our release stats

What is Karma?

Karma is an information integration tool that enables users to quickly and easily integrate data from a variety of data sources including databases, spreadsheets, delimited text files, XML, JSON, KML and Web APIs. Users integrate information by modeling it according to an ontology of their choice using a graphical user interface that automates much of the process. Karma learns to recognize the mapping of data to ontology classes and then uses the ontology to propose a model that ties together these classes. Users then interact with the system to adjust the automatically generated model. During this process, users can transform the data as needed to normalize data expressed in different formats and to restructure it. Once the model is complete, users can published the integrated data as RDF or store it in a database.

You can find useful tutorials on the project Website: http://www.isi.edu/integration/karma/

Installation and Setup

Look in the Wiki Installation

Frequently Asked Questions

How to perform offline RDF generation for a data source using a published model?

  1. Model your source and publish it's model (the published models are located at src/main/webapp/publish/R2RML/ inside the Karma directory).
  2. To generate RDF of a CSV/JSON/XML file, go to the top level Karma directory and run the following command from terminal:
cd karma-offline
mvn exec:java -Dexec.mainClass="edu.isi.karma.rdf.OfflineRdfGenerator" -Dexec.args="--sourcetype
<sourcetype> --filepath <filepath> --modelfilepath <modelfilepath> --sourcename <sourcename> --outputfile <outputfile> --JSONOutputFile<outputJSON-LD>" -Dexec.classpathScope=compile
Valid argument values for sourcetype are: CSV, JSON, XML. Also, you need to escape the double quotes that go inside argument values. Example invocation for a JSON file:
mvn exec:java -Dexec.mainClass="edu.isi.karma.rdf.OfflineRdfGenerator" -Dexec.args="
--sourcetype JSON
--filepath \"/Users/shubhamgupta/Documents/wikipedia.json\"
--modelfilepath \"/Users/shubhamgupta/Documents/model-wikipedia.n3\"
--sourcename wikipedia
--outputfile wikipedia-rdf.n3
--JSONOutputFile wikipedia-rdf.json" -Dexec.classpathScope=compile
  1. To generate RDF of a database table, go to the top level Karma directory and run the following command from terminal:
cd karma-offline
mvn exec:java -Dexec.mainClass="edu.isi.karma.rdf.OfflineRdfGenerator" -Dexec.args="--sourcetype DB
--modelfilepath <modelfilepath> --outputfile <outputfile> --dbtype <dbtype> --hostname <hostname>
--username <username> --password <password> --portnumber <portnumber> --dbname <dbname> --tablename <tablename> --JSONOutputFile<outputJSON-LD>" -Dexec.classpathScope=compile
Valid argument values for `dbtype` are Oracle, MySQL, SQLServer, PostGIS, Sybase. Example invocation:
mvn exec:java -Dexec.mainClass="edu.isi.karma.rdf.OfflineRdfGenerator" -Dexec.args="
--sourcetype DB --dbtype SQLServer
--hostname example.com --username root --password secret
--portnumber 1433 --dbname Employees --tablename Person
--modelfilepath \"/Users/shubhamgupta/Documents/db-r2rml-model.ttl\"
--outputfile db-rdf.n3
--JSONOutputFile db-rdf.json" -Dexec.classpathScope=compile

You can do mvn exec:java -Dexec.mainClass="edu.isi.karma.rdf.OfflineRdfGenerator" -Dexec.args="--help" to get information about required arguments.

How to set up password protection for accessing Karma?

  • in /src/main/config/jettyrealm.properties change user/password (if you wish)
  • in /src/main/webapp/WEB-INF/web.xml uncomment security section at the end of the file
  • in pom.xml uncomment security section (search for loginServices)

Are there additional steps required to import data from Oracle database?

Yes. Due to Oracles binary license issues, we can't distribute the JAR file that is required for importing data from an Oracle database. Following are the steps to resolve the runtime error that you will get if you try to do it with the current source code:

  1. Download the appropriate JDBC drive JAR file (for JDK 1.5 and above) that matches your Oracle DB version. Link: http://www.oracle.com/technetwork/database/features/jdbc/index-091264.html
  2. Put the downloaded JAR file inside lib folder of the Karma source code.
  3. Add the following snippet in the pom.xml file (present inside the top level folder of Karma source code) inside the dependencies XML element:
<dependency>
    <groupId>com.oracle</groupId>
    <artifactId>ojdbc</artifactId>
    <version>14</version>
    <scope>system</scope>
    <systemPath>/Users/karma/Web-Karma/lib/ojdbc14.jar</systemPath>
</dependency>

Make sure that the filename mentioned in the systemPath element matches with your downloaded JAR file; it is likely that your installation folder is different from /Users/karma so make sure you use the correct one.

Are there additional steps required to import data from MySQL database?

Yes. Due to MySQL binary license issues, we can't distribute the JAR file that is required for importing data from an MySQL database. Following are the steps to resolve the runtime error that you will get if you try to do it with the current source code:

  1. Download the appropriate MySQL driver JAR file (for JDK 1.5 and above) that matches your MySQL version. Link: http://dev.mysql.com/downloads/connector/j/
  2. Put the downloaded JAR file inside lib folder of the Karma source code.
  3. Add the following snippet in the pom.xml file of the karma-jdbc project inside the dependencies XML element:
<dependency>
    <groupId>mysql</groupId>
    <artifactId>mysql-connector-java</artifactId>
    <version>5.1.32</version>
    <scope>system</scope>
    <systemPath>/Users/karma/Web-Karma/lib/mysql-connector-java-5.1.32-bin.jar</systemPath>
</dependency>

Make sure that the filename mentioned in the systemPath element matches with your downloaded JAR file; it is likely that your installation folder is different from /Users/karma so make sure you use the correct one. The version will be the version of the JAR that you downloaded.

More Repositories

1

kgtk

Knowledge Graph Toolkit
Jupyter Notebook
353
star
2

ontology-visualization

A simple ontology and RDF visualization tool.
Python
124
star
3

cskg

CSKG: The CommonSense Knowledge Graph
Jupyter Notebook
113
star
4

rltk

Record Linkage ToolKit (Find and link entities)
Python
105
star
5

dig-etl-engine

Download DIG to run on your laptop or server.
101
star
6

etk

Extraction Toolkit
HTML
81
star
7

kgtk-notebooks

Tutorial and hands-on notebook on using the Knowledge Graph Toolkit (KGTK)
Jupyter Notebook
78
star
8

kgtk-similarity

Python
27
star
9

isi-tkg-icl

Temporal Knowledge Graph Forecasting Using In-Context Learning (EMNLP 2023)
Python
22
star
10

festival-text-to-speech-service

REST service to call the Festival text to speech application
C++
22
star
11

t2wml

Table to Wikidata Mapping Language
TypeScript
22
star
12

table-linker

Table Linker
Python
21
star
13

szeke

Information Integration Tool
Java
18
star
14

usc-isi-i2.github.io

Website for USC ISI information integration group
HTML
17
star
15

dig-lsh-clustering

Clustering documents based on LSH
Python
14
star
16

saam-lod

Linked Data mapping for Smithsonian American Art Museum
Web Ontology Language
12
star
17

pyrallel

Yet another easy-to-use python3 parallel library for humans.
Python
12
star
18

gaia-knowledge-graph

Tools to build knowledge graphs from multi-modal extractions
Python
11
star
19

dsbox-ta2

The DSBox TA2 component
Python
11
star
20

dig-elasticsearch

Code to process datasets for elastic search
Java
10
star
21

logical-fallacy-identification

Jupyter Notebook
10
star
22

linked-maps

Framework to build linked spatio-temporal data from vectorized evolutionary topographic map archives
Python
10
star
23

dig-dictionary-extraction

Implements dictionary-based entity extraction as described in the FAERIE paper http://dbgroup.cs.tsinghua.edu.cn/dd/papers/sigmod2011-faerie.pdf
C
9
star
24

graph-keyword-search

Keyword query search engine on semantic store/linked data web
Python
9
star
25

d-repr

Dataset Representation Language for Reading Heterogeneous Datasets to RDF or JSON
Rust
9
star
26

datamart

Data augment
Jupyter Notebook
8
star
27

DSCI-510-Fall-2024

7
star
28

dig-text-similarity-search

Julia
7
star
29

karma-step-by-step

Step by step tutorial to learn Karma
Python
7
star
30

kgtk-browser

Python
7
star
31

ppjoin

PPJoin and P4Join Python 3 implementation
Python
6
star
32

sand

Semantic ANotation of tabular Data
TypeScript
6
star
33

eswc-2015-semantic-typing

Repo for paper, data and software to run the experiments
Java
6
star
34

dsbox-cleaning

The data cleaning TA1 component of DSBox
Python
6
star
35

social-media-meme-identification

Jupyter Notebook
6
star
36

CKG-COVID-19

Jupyter Notebook
5
star
37

dig-stylometry

Python
5
star
38

dsbox-profiling

The data profiling TA1 component of DSBox
Python
5
star
39

bsl

Blocking Scheme Learner
Java
5
star
40

hybrid-jaccard

Implementation of hybrid jaccard similarity
Python
5
star
41

mowgli-in-the-jungle

'mowgli-in-the-jungle' framework for development of solutions on several Machine commonsense datasets.
Python
4
star
42

wikidata-wikifier

Python
4
star
43

wikidata-fuzzy-search

TypeScript
4
star
44

dig-sandpaper

Python
4
star
45

analogical-transfer-learning

Jupyter Notebook
4
star
46

isi-table-understanding

A framework for implementing table understanding systems
4
star
47

wd-quality

Notebooks for generating and validating constraints in WIkidata
Shell
4
star
48

record-linkage-learning

Record Linkage Project, learning FRIL configurations
Java
3
star
49

dig-prep

repository for java + python code for preparing data sets and data delivery tools
Python
3
star
50

kgtk-search

Jupyter Notebook
3
star
51

isi-pubgraph

Repository with tools to create and analyze a knowledge graph about research papers, publication venues, authors, and institutions.
Python
3
star
52

dig-alignment

Code to do feature alignment in dig
PHP
3
star
53

sparql-jsonld

Python
2
star
54

rltk-experimentation

Example, test and benchmark for RLTK (v2)
Python
2
star
55

GRAMS

Python
2
star
56

mydig-webservice

HTML
2
star
57

dig-dictionaries

Useful dictionaries for DIG
Python
2
star
58

dig-tokenizer

Flexible way to tokenize documents in Spark
Python
2
star
59

meme-understanding

Jupyter Notebook
2
star
60

dig-wikifier

Python
2
star
61

wd-similarity

Jupyter Notebook
2
star
62

dig-crf

CRF++ extraction for DIG
Python
2
star
63

kgtk-at-2021-wikidata-workshop

Code and datasets for the KGTK demo at the 2021 Wikidata Workshop at ISWC
Jupyter Notebook
2
star
64

KarmaSpatialClustering

The code includes clustering algorithm of spatial data and related data pre-processing, visualization and file operation, etc.
Python
2
star
65

dig-visualization

JavaScript
2
star
66

dig-sparkutil

python utilities for incorporating DIG components into Spark workflows
Python
1
star
67

karma-information-extraction

Information extraction service for Karma
Java
1
star
68

t2wml-api

backend for t2wml gui
Python
1
star
69

datamart-upload

REST api and upload functions for datamart project
Python
1
star
70

lsh-linking

Testing LSH and MinHash for doing record linkage and deduplication
Python
1
star
71

image-metadata-enhancement

HTML
1
star
72

Social-Viz

JavaScript
1
star
73

gaia-ta2pipeline

Jupyter Notebook
1
star
74

eidos

HTML
1
star
75

dig-age-extractor

Python
1
star
76

datamart-frontend

GUI for datamart project
JavaScript
1
star
77

mint-data-catalog-public

Public MInt Data Catalog
Jupyter Notebook
1
star
78

dig-url-extractor

Python
1
star
79

dig-extract

python-based repository for DIG extractors
Python
1
star
80

pper-criminal-justice

Jupyter Notebook
1
star
81

dig-tokenizer-extractor

Python
1
star
82

datamart-api-notebook

Jupyter notebook demonstrating the capabilities of ISI Datamart using REST API
Jupyter Notebook
1
star
83

dsbox-ta2-system

Dockerfile
1
star
84

SemanticLabelingAlgorithm

Julia
1
star
85

karma-visualization

Examples, models and test code for creating visualizations within Karma
JavaScript
1
star
86

SemanticLabelingService

Python
1
star
87

datamart-userend

ISI datamart implementation for users
Python
1
star
88

minmod-webapp

Java
1
star
89

wikibaseTools

A package that accelerates domain-specific wikibase instance setup, as well as maintains maximal compatibility with wikidata.
Python
1
star
90

dig-entity-merger

Python
1
star
91

datamart-api

Jupyter Notebook
1
star