• Stars
    star
    121
  • Rank 293,924 (Top 6 %)
  • Language
    Java
  • Created over 12 years ago
  • Updated 6 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Sparql -> SQL Rewriter enabling virtual RDB -> RDF mappings

Sparqlify SPARQL->SQL rewriter

Build Status

Introduction

Sparqlify is a scalable SPARQL-SQL rewriter whose development began in April 2011 in the course of the LinkedGeoData project.

This system's features/traits are:

  • Support of the 'Sparqlification Mapping Language' (SML), an intuitive language for expressing RDB-RDF mappings with only very little syntactic noise.
  • Scalability: Sparqlify does not evaluate expressions in memory. All SPARQL filters end up in the corresponding SQL statement, giving the underlying RDBMS has maximum control over query planning.
  • A powerful rewriting engine that analyzes filter expressions in order to eleminate self joins and joins with unsatisfiable conditions.
  • Initial support for spatial datatypes and predicates.
  • A subset of the SPARQL 1.0 query language plus sub queries are supported.
  • Tested with PostgreSQL/Postgis and H2. Support for further databases is planned.
  • CSV support
  • R2RML will be supported soon

Functions

SPARQL-to-SQL function mappings are specified in the file functions.xml.

Standard SPARQL functions
SPARQL function SQL Definition
boolean strstarts(string, string) strpos($1$, $2$) = 1
TODO
Spatial Function Extensions
SPARQL function SQL Definition
TODO

Supported SPARQL language features

  • Join, LeftJoin (i.e. Optional), Union, Sub queries
  • Filter predicates: comparison: (<=, <, =, >, >=) logical: (!, &&; ||) arithmetic: (+, -) spatial: st_intersects, geomFromText; other: regex, lang, langMatches
  • Aggregate functions: Count(*)
  • Order By is pushed into the SQL

Debian packages

Sparqlify Debian packages can be obtained by following means:

Public repositories

After setting up any of the repositories below, you can install sparqlify with apt using

  • apt: `sudo apt-get install sparqlify-cli

Linked Data Stack (this is what you want)

Sparqlify is distributed at the Linked Data Stack, which offers many great tools done by various contributors of the Semantic Web community.

  • The repository is available in the flavors nightly, testing and stable here.
# !!! Replace stable with nightly or testing as needed !!!

# Download the repository package
wget http://stack.linkeddata.org/ldstable-repository.deb

# Install the repository package
sudo dpkg -i ldstable-repository.deb

# Update the repository database
sudo apt-get update

Bleeding Edge (Not recommended for production)

For the latest development version (built on every commit) perform the following steps

Import the public key with

wget -qO - http://cstadler.aksw.org/repos/apt/conf/packages.precise.gpg.key  | sudo apt-key add -

Add the repository

echo 'deb http://cstadler.aksw.org/repos/apt precise main contrib non-free' | sudo tee -a /etc/apt/sources.list.d/cstadler.aksw.org.list

Note that this also works with distros other than "precise" (ubuntu 12.04) such as ubuntu 14.04 or 16.04.

Building

Building the repository creates the JAR files providing the sparqlify-* tool suite.

One of the plugins requires the xjc command (for compiling an XML schema to Java classes) which is no longer part of the jdk. The following package provides it:

sudo apt install jaxb

Debian package

Building debian packages from this repo relies on the Debian Maven Plugin plugin, which requires a debian-compatible environment. If such an environment is present, the rest is simple:

# Install all shell scripts necessary for creating deb packages
sudo apt-get install devscripts

# Execute the follwing from the `<repository-root>/sparqlify-core` folder:
mvn clean install deb:package

# Upon sucessful completion, the debian package is located under `<repository-root>/sparqlify-core/target`
# Install using `dpkg`
sudo dpkg -i sparqlify_<version>.deb

# Uninstall using dpkg or apt:
sudo dpkg -r sparqlify
sudo apt-get remove sparqlify

Assembly based

Another way to build the project is run the following commands at <repository-root>

mvn clean install

cd sparqlify-cli
mvn assembly:assembly

This will generate a single stand-alone jar containing all necessary dependencies. Afterwards, the shell scripts under sparqlify-core/bin should work.

Tool suite

If Sparqlify was installed from the debian package, the following commands are available system-wide:

  • sparqlify: This is the main executable for running individual SPARQL queries, creating dumps and starting a stand-alone server.
  • sparqlify-csv: This tool can create RDF dumps from CSV file based on SML view definitions.
  • sparqlify-platform: A stand-alone server component integrating additional projects.

These tools write their output (such as RDF data in the N-TRIPLES format) to STDOUT. Log output goes to STDERR.

sparqlify

Usage: sparqlify [options]

Options are:

  • Setup

    • -m SML view definition file
  • Database Connectivity Settings

    • -h Hostname of the database (e.g. localhost or localhost:5432)
    • -d Database name
    • -u User name
    • -p Password
    • -j JDBC URI (mutually exclusive with both -h and -d)
  • Quality of Service

    • -n Maximum result set size
    • -t Maximum query execution time in seconds (excluding rewriting time)
  • Stand-alone Server Configuration

    • -P Server port [default: 7531]
  • Run-Once (these options prevent the server from being started and are mutually exclusive with the server configuration)

    • -D Create an N-TRIPLES RDF dump on STDOUT
    • -Q [SPARQL query] Runs a SPARQL query against the configured database and view definitions

Example

The following command will start the Sparqlify HTTP server on the default port.

sparqlify -h localhost -u postgres -p secret -d mydb -m mydb-mappings.sml -n 1000 -t 30

Agents can now access the SPARQL endpoint at http://localhost:7531/sparql

sparqlify-csv

Usage: sparqlify-csv [options]

  • Setup

    • -m SML view definition file
    • -f Input data file
    • -v View name (can be omitted if the view definition file only contains a single view)
  • CSV Parser Settings

    • -d CSV field delimiter (default is '"')
    • -e CSV field escape delimiter (escapes the field delimiter) (default is '')
    • -s CSV field separator (default is ',')
    • -h Use first row as headers. This option allows one to reference columns by name additionally to its index.

sparqlify-platform (Deprecated; about to be superseded by sparqlify-web-admin)

The Sparqlify Platform (under /sparqlify-platform) bundles Sparqlify with the Linked Data wrapper Pubby and the SPARQL Web interface Snorql.

Usage: sparqlify-platform config-dir [port]

  • config-dir Path to the configuration directory, e.g. <repository-root/sparqlify-platform/config/example>
  • port Port on which to run the platform, default 7531.

For building, at the root of the project (outside of the sparqlify-* directories), run mvn compile to build all modules. Afterwards, lauch the platform using:

cd sparqlify-platform/bin
./sparqlify-platform <path-to-config> <port>

Assuming the platform runs under http://localhost:7531, you can access the following services relative to this base url:

  • /sparql is Sparqlify's SPARQL endpoint
  • /snorql shows the SNORQL web frontend
  • /pubby is the entry point to the Linked Data interface

Configuration

The configDirectory argument is mandatory and must contain a sub-directory for the context-path (i.e. sparqlify-platform) in turn contains the files:

  • platform.properties This file contains configuration parameters that can be adjusted, such as the database connection.
  • views.sparqlify The set of Sparqlify view definition to use.

I recommend to first create a copy of the files in /sparqlify-platform/config/example under a different location, then adjust the parameters and finally launch the platform with -DconfigDirectory=... set appropriately.

The platform applies autoconfiguration to Pubby and Snorql:

  • Snorql: Namespaces are those of the views.sparqlify file.
  • Pubby: The host name of all resources generated in the Sparqlify views is replaced with the URL of the platform (currently still needs to be configured via platform.properties)

Additionally you probably want to make the URIs nice by e.g. configuring an apache reverse proxy:

Enable the apache proxy_http module:

sudo a2enmod proxy_http

Then in your /etc/apache2/sites-available/default add lines such as

ProxyRequest Off
ProxyPass /resource http://localhost:7531/pubby/bizer/bsbm/v01/ retry=1
ProxyPassReverse /resource http://localhost:7531/pubby/bizer/bsbm/v01/

These entries will enable requests to http://localhost/resource/... rather than http//localhost:7531/pubby/bizer/bsbm/v01/.

The retry=1 means, that apache only waits 1 seconds before retrying again when it encounters an error (e.g. HTTP code 500) from the proxied resource.

IMPORTANT: ProxyRequests are off by default; DO NOT ENABLE THEM UNLESS YOU KNOW WHAT YOU ARE DOING. Simply enabling them potentially allows anyone to use your computer as a proxy.

SML Mapping Syntax:

A Sparqlification Mapping Language (SML) configuration is essentially a set of CREATE VIEW statements, somewhat similar to the CREATE VIEW statement from SQL. Probably the easiest way to learn to syntax is to look at the following resources:

Two more examples are from

Additionally, for convenience, prefixes can be declared, which are valid throughout the config file. As comments, you can use //, /* */, and #.

For a first impression, here is a quick example:

/* This is a comment
 * /* You can even nest them! */
 */
// Prefixes are valid throughout the file
Prefix dbp:<http://dbpedia.org/ontology/>
Prefix ex:<http://ex.org/>

Create View myFirstView As
    Construct {
        ?s a dbp:Person .
        ?s ex:workPage ?w .
    }
With
    ?s = uri('http://mydomain.org/person', ?id) // Define ?s to be an URI generated from the concatenation of a prefix with mytable's id-column.
    ?w = uri(?work_page) // ?w is assigned the URIs in the column 'work_page' of 'mytable'
Constrain
    ?w prefix "http://my-organization.org/user/" // Constraints can be used for optimization, e.g. to prune unsatisfiable join conditions
From
    mytable; // If you want to use an SQL query, the query (without trailing semicolon) must be enclosed in double square brackets: [[SELECT id, work_page FROM mytable]]

Notes for sparqlify-csv

For sparqlify-csv view definition syntax is almost the same as above; the differences being:

  • Instead of Create View viewname As Construct start your views with CREATE VIEW TEMPLATE viewname As Construct
  • There is no FROM and CONSTRAINT clause

Colums can be referenced either by name (see the -h option) or by index (1-based).

Example

// Assume a CSV file with the following columns (osm stands for OpenStreetMap)
(city\_name, country\_name, osm\_entity\_type, osm\_id, longitude, latitude)

Prefix fn:<http://aksw.org/sparqlify/> //Needed for urlEncode and urlDecode.
Prefix rdfs:<http://www.w3.org/2000/01/rdf-schema#>
Prefix owl:<http://www.w3.org/2002/07/owl#>
Prefix xsd:<http://www.w3.org/2001/XMLSchema#>
Prefix geo:<http://www.w3.org/2003/01/geo/wgs84_pos#>

Create View Template geocode As
  Construct {
    ?cityUri
      owl:sameAs ?lgdUri .

    ?lgdUri
      rdfs:label ?cityLabel ;
      geo:long ?long ;
      geo:lat ?lat .
  }
  With
    ?cityUri = uri(concat("http://fp7-pp.publicdata.eu/resource/city/", fn:urlEncode(?2), "-", fn:urlEncode(?1)))
    ?cityLabel = plainLiteral(?1)
    ?lgdUri = uri(concat("http://linkedgeodata.org/triplify/", ?4, ?5))
    ?long = typedLiteral(?6, xsd:float)
    ?lat = typedLiteral(?7, xsd:float)

More Repositories

1

Knowledge-Graph-Analysis-Programming-Exercises

Exercises for the Analysis of Knowledge Graphs
Jupyter Notebook
277
star
2

DL-Learner

A tool for supervised Machine Learning in OWL and Description Logics
Java
152
star
3

AK-DE-biGRU

Improving Response Selection in Multi-turn Dialogue Systems by Incorporating Domain Knowledge
Python
58
star
4

jena-sparql-api

A collection of Jena-extensions for hiding SPARQL-complexity from the application layer
Java
57
star
5

HORUS-NER

HORUS: A framework to boost NLP tasks
Python
50
star
6

BioKEEN

A computational library for learning and evaluating biological knowledge graph embeddings - please see the main PyKEEN repo at https://github.com/pykeen/pykeen/
Jupyter Notebook
45
star
7

SemWeb2NL

Semantic Web related concepts converted to Natural language
Web Ontology Language
44
star
8

LiteralE

Knowledge Graph Embeddings learned from the structure and literals of knowledge graphs
Python
44
star
9

RdfProcessingToolkit

Command line interface based RDF processing toolkit to run sequences of SPARQL statements ad-hoc on RDF datasets, streams of bindings and streams of named graphs with support for processing JSON, CSV and XML using function extensions
Java
37
star
10

Polisis_Benchmark

Reproducing state-of-the-art results
Python
21
star
11

SML-Bench

A Benchmark for Machine Learning from Structured Data
Prolog
21
star
12

OWL2SPARQL

OWL To SPARQL Query Rewriter
Java
20
star
13

KG-Copy_Network

Implementation of the paper: Using a KG-Copy Network for Non-Goal Oriented Dialogues
Python
19
star
14

MA-INF-4222-NLP-Lab

MA-INF 4222: NLP Lab (University of Bonn)
Jupyter Notebook
19
star
15

Jassa-UI-Angular

Angular-JS based user interface components for Jassa
JavaScript
15
star
16

linked-uspto-patent-data

Java
11
star
17

dcat-suite

Semantic Web library and tool for retrieval and deployment of data from/to GIT, CKAN, MAVEN repos and triple stores using DCAT as the backbone.
Java
11
star
18

lodservatory

Public SPARQL Endpoint Service Monitoring
Shell
11
star
19

Wikipedia_TF_IDF_Dataset

Pre-computed IDF stats over all EN Wiki articles
10
star
20

kgirnet

Scripts for KGIRNet model for ESWC
Python
10
star
21

MA-INF-4223-DBDA-Lab

Repository for Lab “Distributed Big Data Analytics” (MA-INF 4223), University of Bonn
Jupyter Notebook
10
star
22

SDA-README

Links to SDA Github organisations - visit those if you want to see all our projects
9
star
23

DL-Learner-Protege-Plugin

A Protégé plugin for the DL-Learner framework
Java
9
star
24

OpenResearch

Public issue system for OPENRESEARCH/ConfIDent
Python
8
star
25

R2RLint

An RDB2RDF quality assessment tool
Java
7
star
26

minds

MINDS - Maths INsiDe SPARQL
Python
5
star
27

POEM

A package of training and evaluating multimodal knowledge graphs embedding models.
Python
5
star
28

ORE

Ontology Repair and Enrichment
Java
5
star
29

proxy_indicators

Jupyter Notebook
4
star
30

codeCAI

Python
4
star
31

SubgraphIsomorphismIndex

An index data structure for fast isomorphic subset / subgraph queries
Java
4
star
32

sda.tech

a linked data driven web page rendered by Jekyll-RDF
TeX
4
star
33

TagMap

Implementation of an Index Data Structure for Fast Subset and Superset Queries (based on the paper by Iztok Savnik)
Java
3
star
34

BigDataOcean-Harmonization

Tool for harmonization of datasets in BigDataOcean
Java
3
star
35

Embeddable-BSBM

Fork of the BSBM source code intended for closer integration in software projects
Java
3
star
36

BigDataOcean-LOV

BigDataOcean Metadata Repository (based on LOV)
HTML
3
star
37

qelos-core

Pytorch utilities
Python
3
star
38

EULAide

Interpretation of an EULA (End-User License Agreement) for the benefit of end-user
Roff
3
star
39

aksw-commons

A collection of utilities and micro frameworks with as little dependencies as possible. For the cases where Guava isn't enough.
Java
3
star
40

BlankNodeSurvey

Survey and accompanying toolkit for analyzing how triple stores support references to blank nodes
Shell
2
star
41

ARCANA

Large Scale Quality Assessment For Potential Dual Use with Spark - Master Thesis Project
Scala
2
star
42

KUPP

A Python package for preprocessing a knowledge graph.
Python
2
star
43

transformers_dialogue_evaluators

Resources to reproduce the results reported in the paper: "Language Model Transformers as Evaluators for Open-domain Dialogues".
Jupyter Notebook
2
star
44

Conjure

A declarative approach to conjure RDF datasets from RDF datasets using SPARQL with caching of repeated operations.
Java
1
star
45

lascar.sda.tech

Workshop on Large Scale RDF Analytics - LASCAR
HTML
1
star
46

MA-INF-4221-NLP-Seminar

MA-INF 4221: NLP Seminar (University of Bonn)
1
star
47

iana-language-subtag-registry-rdf

Jena plugin to RDFize the iana subtag registry and validate language tags against it
Java
1
star
48

dialogue

shared repo for collected dialogue tools
Python
1
star
49

Beast

Benchmarking, Evaluation, and Analysis Stack - A powerful yet lightweight Java8/Jena-based RDF processing stack.
Java
1
star
50

SDA-Publications

TeX
1
star
51

KEEN-Model-Zoo

A model zoo for the KEEN Universe
Python
1
star
52

MoMatch

MoMatch (Multilingual Ontology Matching)
Scala
1
star
53

AutoChef

Automated recipe generator
Jupyter Notebook
1
star
54

Climate-Bot

This repo includes the data and code for the demo paper titled "Climate Bot: A Machine Reading Comprehension System for Climate Change Question Answering"
1
star