• Stars
    star
    118
  • Rank 299,923 (Top 6 %)
  • Language
    Shell
  • License
    Other
  • Created over 8 years ago
  • Updated 5 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

creates a docker image with Virtuoso preloaded with the latest DBpedia dataset

Virtuoso SPARQL Endpoint Quickstart

Creates and runs a Virtuoso Open Source instance including a SPARQL endpoint preloaded with a Databus Collection and the VOS DBpedia Plugin installed.

Quickstart

Running the Virtuoso SPARQL Endpoint Quickstart requires Docker and Docker Compose installed on your system. If you do not have those installed, please follow the install instructions for here and here. Once you have both Docker and Docker Compose installed, run

git clone https://github.com/dbpedia/virtuoso-sparql-endpoint-quickstart.git
cd virtuoso-sparql-endpoint-quickstart
COLLECTION_URI=https://databus.dbpedia.org/dbpedia/collections/dbpedia-snapshot-2022-03 VIRTUOSO_ADMIN_PASSWD=YourSecretPassword docker-compose up

After a short delay your SPARQL endpoint will be running at localhost:8890/sparql.

Note that loading huge datasets to the Virtuoso triple store takes some time. Even though the SPARQL endpoint is up and running, the loading process might still take up to several hours depending on the amount of data you are trying to load.

In order to verify your setup more quickly you can use the following collection URI instead: https://databus.dbpedia.org/dbpedia/collections/virtuoso-sparql-endpoint-quickstart-preview

Note that this collection is only a collection of RDF data to test drive the docker compose network and not a DBpedia release. After a short delay the resource http://localhost:8890/page/Berlin should be accessible.

Troubleshooting

  • virtuoso-sparql-endpoint-quickstart_load_1 exited with code 1 something went wrong with loading the files, data may be incompletely loaded
      • load_1 | *** Error 28000: [Virtuoso Driver]CL034: Bad login you are not using the correct password (the one provided when starting the compose setup for the first time
  • store_1 | 05:28:37 *** read-ahead of a free or out of range page dp L=318307, database not necessarily corrupted. increase the memory settings (BUFFERS options) for the virtuoso database in config.env see here for more details
  • store_1 | 05:18:32 Write wait on column page 62980. Waits should be on the index leaf page, except when col page is held for read by background write see above
  • store_1 | 05:27:43 * Monitor: High disk read (1) see above

Documentation

The Virtuoso SPARQL Endpoint Quickstart is a network of three different docker containers which are launched with docker-compose. The following containers are being run:

Once the loading process has been completed, only the OpenLink VOS Instance will keep running. The other two containers will shut down once their job is done. By running docker ps you can see whether the download and loader container are still running. If there is only the OpenLink VOS Instance remaining, all your data has been loaded to the triple store.

The possible configurations for all containers are documented below. The repository includes an .env file containing all configurable environment parameters for the network.

Environment Variables

Running docker-compose up will use the environment variables specified in the .env file next to the docker-compose.yml. The available variables are:

  • VIRTUOSO_ADMIN_PASSWD: The password for the Virtuoso Database. This needs to be set in order to successfully start the SPARQL endpoint.

  • VIRTUOSO_HTTP_PORT: The HTTP port of the OpenLink VOS instance.

  • VIRTUOSO_ISQL_PORT: The ISQL port of the OpenLink VOS instance.

  • VIRTUOSO_DIR: The directory that stores the content of the virtuoso triple store.

  • COLLECTION_URI: The URI of a Databus Collection. If you want to load the DBpedia Dataset it is recommended to use a Snapshot Collection (2022-03). You can start the SPARQL endpoint with any other Databus Collection or you can copy the files manually into the ./downloads folder.

  • DATA_DIR: The directory containing the loaded data. The download container will download files to this directory. You can also copy files into the directory manually.

  • DOMAIN: The domain of your resource identifiers. This variable is only required if you intend to access the HTML view of your resources (e.g. if you want to run a DBpedia Chapter). The HTML view will only show correct views for identifiers in the specified domain. (e.g. set this to http://ru.dbpedia.org when running the Russian chapter with Russian resource identifiers)

  • DBP_LANG : The language code of your language. Defaults to 'en'.

  • DBP_CATEGORY : The word 'category' in your language. Defaults to 'Category'.

Container Configurations

You can configure the containers in the network even further by adjusting the docker-compose.yml file. The following section lists all the environment variables that can only be set in the docker-compose.yml for each of the containers.

Only change the docker-compose.yml if you know what you are doing. For most users the .env file is sufficient.

Container 1: OpenLink VOS Instance

You can read the full documentation of the docker image here. The image requires one environment variable to set the admin password of the database:

  • DBA_PASSWORD: Your database admin password. It is recommended to set this by setting the VIRTUOSO_ADMIN_PASSWD variable in the .env file.
  • VIRT_PARAMETERS_NUMBEROFBUFFERS: Defaults to 2000 which will result in a very long loading time. Increase this depending on the available memory on your machine. You can find more details in the docker image documentation.
  • VIRT_PARAMETERS_MAXDIRTYBUFFERS: Same as VIRT_PARAMTERS_NUMBEROFBUFFERS.

This password is only set when a new database is created. The example docker-compose mounts a folder to the internal database directory for persistence. Note that this folder needs to be cleared in order to change the password via docker-compose.

The second volume specified in the docker-compose file connects the downloads folder to a directory in the container that is accessible by the virtuoso load script. Accessible paths are set in the internal virtuoso.ini file (DirsAllowed). As the docker-compose uses the vanilla settings of the image the local ./downloads folder is mounted to /usr/share/proj inside of the container which is in the DirsAllowed per default.

Container 2: DBpedia Databus Collection Downloader

This project uses the DBpedia Databus Collection Downloader. You can find the documentation here. If you haven't already, download and build the download client docker image. The required environment variables are:

  • TARGET_DIR: The target directory for the downloaded files (inside of the container). Make sure that the directory is mounted to a local folder to access the files in the docker network.

Container 3: Loader/Installer

The loader/installer container is being pulled from (dbpedia/virtuoso-sparql-endpoint-quickstart).

Alternatively, you could potentially modify and then build the loader/installer docker image by running

cd ./dbpedia-loader
docker build -t dbpedia-virtuoso-loader .

You can configure the container with the following environment variables:

  • STORE_DATA_DIR: The directory of the VOS instance that the downloads folder is mounted to (/usr/share/proj by default). Since the Loader will tell the VOS instance to start importing files it needs to know where the files are going to be. Additionally the VOS instance needs to be given access to that directory.
  • STORE_DBA_PASSWORD: The admin password specified in the VOS instance (DBA_PASSWORD variable). It is recommended to set this by setting the VIRTUOSO_ADMIN_PASSWD variable in the .env file.
  • DATA_DIR: The directory of this container that the downloads folder is mounted to.
  • [OPTIONAL] DATA_DOWNLOAD_TIMEOUT: The amount of seconds until the loader process stops waiting for the download to finish.
  • [OPTIONAL] STORE_CONNECTION_TIMEOUT: The amount of seconds until the loader process stops waiting for the store to boot up.

Instructions for DBpedia Chapters

In case of emergency or confusion please visit this forum thread. Feel free to ask and answer questions as it will help future chapter deployments!

In order to use the Virtuoso SPARQL Endpoint Quickstart docker network to host your own DBpedia instance you need to create a chapter collection on the DBpedia Databus. You can learn about the Databus and Databus Collections in the DBpedia Stack Tutorial on Youtube

Alternatively you can download the required data to your local machine and supply the files manually. It is however recommended to use Collections as it makes updating to future version much easier.

Set the COLLECTION_URI variable to your chapter collection URI and adjust the DOMAIN variable to match the domain of your resource identifiers. Alternatively (not recommended) copy your files into the directory specified in DATA_DIR and remove the download container section from the docker-compose.yml)

Once all variables are set in the .env file run

docker-compose up

Enabling Federated Queries

Federated queries can be enabled by granting the roles SPARQL_LOAD_SERVICE_DATA and SPARQL_SPONGE to the SPARQL user.

docker exec -it [virtuoso_docker_name] /bin/bash
isql-v -U dba -P [virtuoso_admin_password]
grant SPARQL_LOAD_SERVICE_DATA to "SPARQL";
grant SPARQL_SPONGE to "SPARQL";

More Repositories

1

extraction-framework

The software used to extract structured data from Wikipedia
Scala
852
star
2

fact-extractor

Fact Extraction from Wikipedia Text
Python
528
star
3

lookup

Outputs a list of ranked DBpedia resources for a search string.
Scala
185
star
4

chatbot

DBpedia Chatbot
Java
103
star
5

dbpedia

Various tools for the DBpedia project - This does NOT contain the DBpedia extaction framework
PHP
97
star
6

embeddings

Knowledge Base Embeddings for DBpedia
Python
86
star
7

links

A repo that contains outgoing links from DBpedia
Java
50
star
8

dbpedia-lookup

A generic entity retrieval service for linked data. Contains presets to replicate the DBpedia Lookup service.
Java
43
star
9

distributed-extraction-framework

DBpedia Distributed Extraction Framework: Extract structured data from Wikipedia in a parallel, distributed manner
Web Ontology Language
41
star
10

databus

A digital factory platform for managing files online with stable IDs, high-quality metadata, powerful API and tools for building on data: find, access, make interoperable, re-use
JavaScript
41
star
11

ontology-driven-api

An ontology-driven RESTstyle API for DBpedia backed by an external SPARQL endpoint
Java
40
star
12

GSoC

Google Summer of Code organization
37
star
13

ontology-tracker

Here we keep track of modification requests in the DBpedia Ontology
Java
35
star
14

DataId-Ontology

The DBpedia DataID vocabulary is a metadata system for detailed descriptions of datasets and their physical instances, as well as their relation to agents like persons or organizations in regard to their rights and responsibilities.
HTML
35
star
15

table-extractor

Extract Data from Wikipedia Tables
Python
32
star
16

list-extractor

Extract Data from Wikipedia Lists
Python
30
star
17

dbpedia-live-mirror

Keeps a mirror of DBpedia live in sync
Java
26
star
18

dbpedia-docs

A tutorial about DBpedia and Linked Data in general
Shell
23
star
19

neural-rdf-verbalizer

πŸ—£ Multilingual RDF Verbalizer – Google Summer of Code 2019
Python
21
star
20

mappings-autogeneration

Tools & scripts to infer new Wikipedia infobox to ontology mappings
Python
21
star
21

gsoc-2020-dashboard

Python
20
star
22

archivo

DBpedia Archivo - Augmented Ontology Archive powered by Databus
Python
20
star
23

neural-extraction-framework

Repository for the GSoC project 'Towards a Neural Extraction Framework'
Jupyter Notebook
16
star
24

webid

WebID Creation and Validation (Tutorial, Tools, Best practices)
PHP
15
star
25

databus-client

Scala
14
star
26

sci-graph-links

Linking DBpedia to SciGraph
Shell
14
star
27

topicmodel-extractor

A repository for the "Combining DBpedia and Topic Modeling" GSoC 2016 idea
Java
13
star
28

dbpedia-links

moved to https://github.com/dbpedia/links
13
star
29

RDF2text-GAN

RDF -to- text generator, using GANs and reinforcement learning. For Google summer of code 2020.
Jupyter Notebook
13
star
30

dataid

The DBpedia Data ID Unit is a DBpedia Group with the goal of describing LOD datasets via RDF files, to host and deliver these metadata files together with the dataset in a uniform way, create and validate such files and deploy the results for the DBpedia and its local chapters.
JavaScript
13
star
31

dbpedia-wiktionary

Precompiled executables, config files and working examples for http://dbpedia.org/Wiktionary
11
star
32

dev.dbpedia.org

Developer Documentation at http://dev.dbpedia.org
CSS
10
star
33

mappings-ui

DBpedia RML mappings management frontend
JavaScript
10
star
34

gfs

DBpedia, which frequently crawls and analyses over 120 Wikipedia language editions has near complete information about (1) which facts are in infoboxes across all Wikipedias (2) where Wikidata is already used in those infoboxes. GlobalFactSyncRE will extract all infobox facts and their references to produce a tool for Wikipedia editors that detects and displays differences across infobox facts in an intelligent way to help sync infoboxes between languages and/or Wikidata. The extracted references will also be used to enhance Wikidata. Click Join below to receive GFS updates via {{ping}} to your Wikiaccount.
Jupyter Notebook
10
star
35

mappings-tracker

This project is used for tracking mapping issues in mappings.dbpedia.org
9
star
36

linking

Workflow for linking external datasets to DBpedia.
Python
9
star
37

dbpedia-vad-i18n

Virtuoso plugin for the serving Linked Data
JavaScript
9
star
38

event-extractor

Repository for the DBpedia GSoC Hybrid Classifier/Rule-based Event Extractor Project
Java
8
star
39

wikidata-mapper

Automated Wikidata mappings to DBpedia ontology GSoC 2014 Project
7
star
40

keyword-search

keyword-search
Java
7
star
41

jsonpedia-extractor

Fine grained massive extraction of Wiipedia content GSoC 2014 Project
JavaScript
6
star
42

databus-maven-plugin

Databus Maven Plugin: Aligning Data and Software Lifecycle with Maven
Scala
6
star
43

cmem-plugin-databus

eccenca Corporate Memory build plugin to publish and load datasets from a DBpedia databus service.
Python
5
star
44

fusion

algorithms to fuse dbpedia
Java
5
star
45

predicate-finder

Python
5
star
46

mappings_chrome_extension

A chrome extension for generating new mappings
JavaScript
5
star
47

ontology-time-machine

Python
5
star
48

databus-mods

Databus Mods (How To and Mod Ontology and Reference Implementation)
Scala
5
star
49

dbpedia-chatbot-backend

HTML
4
star
50

mapping-tool

A GUI for mapping Wikipedia Infoboxes to the DBpedia ontology
JavaScript
4
star
51

marvin-config

Public configuration files for MARVIN - the DBpedia Knowledge Graph extraction and release bot - running on the TIB servers
HTML
4
star
52

stack-tutorial-resources

Resource for the DBpedia Stack Tutorial
XSLT
4
star
53

nlp-dbpedia

Free, open and interoperable (FOI) NLP benchmarks used for and by DBpedia
4
star
54

Multilingual-RDF-Verbalizer

PLSQL
4
star
55

gstore

Git repo / triple store hybrid graph storage
Scala
4
star
56

dbpedia-webprotege

A webprotege deployment for editing the DBpedia ontology
Java
4
star
57

chatbot-ng

Repository for the GSoC 2021 project 'Modular DBpedia Chatbot'.
JavaScript
4
star
58

social-knowledge-graph

Repository for the GSoC 2021 project 'Social Knowledge Graph'.
Jupyter Notebook
4
star
59

quad-processor-util

A handy library for reading & mapping multiple N-Triple (or N-Quad) files at once.
Scala
3
star
60

dnkg-pilot

Dutch National Knowledge Graph Pilot
Shell
3
star
61

DBTax

DBTax project
Java
3
star
62

DBpediaAI

The DBpedia AI project
3
star
63

dbpedia-wiktionary-configuration

The configuration of DBpedia Wiktionary
3
star
64

MissingBot

Java
3
star
65

DBpedia-LiveNeural-Chatbot

The DBpedia Live Neural Chatbot
Python
3
star
66

tablist-extractor

Fusion of the table and list extractors
Python
3
star
67

dbpedia-databus-collection-downloader

Java
2
star
68

gsoc-dbpedia-dashboard

JavaScript
2
star
69

dbpedia-widgets

Simple embed-able widgets
Python
2
star
70

WorldFacts

2
star
71

tutorials

Shell
2
star
72

healthcare-platform

Repository for the GSoC 2021 project 'Update DBpedia SPARQL for Wiki Resources Related to Pandemic, Healthcare, and Health AI Fields'.
Jupyter Notebook
2
star
73

format-mappings

Dev repo towards a knowledge library for format mappings
2
star
74

dbpedia-chatbot-data

Jupyter Notebook
2
star
75

databus-python-client

Python
2
star
76

media-extractor

DBpedia support for multimedia data sources other than Wikipedia. GSoC 2014 project.
Clojure
1
star
77

dbpedia-live-update-viewer

JavaScript
1
star
78

DBpedia-Spotlight-Dashboard

An integrated statistical information tool from the Wikipedia dumps and the DBpedia Extraction Framework artifacts
Python
1
star
79

databus-transfer

Transfer published data to a new Databus
JavaScript
1
star
80

link-based-complementary-fusion

A light-weight tool to fuse complementary facts to DBpedia identifiers
Java
1
star
81

databus-moss

Databus Metadata Overlay Search System
Java
1
star
82

events

DBpedia Events
Java
1
star
83

databus-shared-lib

Scala
1
star
84

databus-moss-frontend

Databus MOSS Frontend
Svelte
1
star
85

databus-moss-docker

Docker Setup for MOSS
1
star
86

wall-of-fame

A SHACLOntology and several tools to attribute contributions to the DBpedia movement to individual DBpedians, i.e. give credit for their merit in a machine readable format (RDF).
Scala
1
star
87

community-viewer

a simple interface that displays the DBpedia community
1
star