• Stars
    star
    134
  • Rank 262,480 (Top 6 %)
  • Language
    Erlang
  • License
    Apache License 2.0
  • Created over 8 years ago
  • Updated about 5 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

simplified query engine based on logic programming paradigm

datalog

Datalog is a query language based on the logic programming paradigm. The library is designed to formalize relation of n-ary streams. It implements an ad-hoc query engine using simplified version of general logic programming paradigm. The library facilitates development of data integration, information exchange and semantic web applications.

Build Status Coverage Status Hex.pm Hex Downloads

Key features

  • Top-down, sub-query, breath-first evaluation algorithm of logical program
  • Erlang native interface to describe relation of streams
  • Formalizes streams relation using human readable language
  • Supports conjunctions, unions and recursion

Background

The logic program consists of finite set of rules and large volume of ground facts -- knowledge. The rules are used to deduce new facts from other facts (built new relations). The Horn clauses formally defines rules (first-order formula)

pโ‚€(แบ‹โ‚€) :- pโ‚(แบ‹โ‚) ^ ... ^ pโ‚™(แบ‹โ‚™).

pโ‚€ is a rule head, it is a producer of new relation (facts). The body is a conjunction of predicates. Each pแตข is a predicate expression consist of predicate symbol and terms such as p(tโ‚ , ... , tโ‚™), terms are either a literal constant or a variable. The predicate expression refers to relation of arbitrary arity - stream of tuples; terms range over this stream of tuples. Body predicates refers either to derived relations or ground facts. The predicates with common variables give rise to join. Ground facts are physically stored in external memory and accesses using streams abstractions.

A head is a new derived relation, deducted through the logical program (body of horn clause) and ground facts. It is not explicitly persisted anywhere and corresponds the relation view (projection). The materialization of these view is the main task of this library.

A naive example, a new relation about is deducted from two relations category and article.

about(title, subject) :- category(x, subject), article(title, x).

ฯƒ function

The library uses a "functional" interpretation of predicates, any predicate is a function -- sigma expression. It associates some of its bound terms to the remaining ones, returning the lazy set of tuples corresponding to materialized predicate. For example if p is binary predicate, its ฯƒ function is denoted as

ฯƒ(S) -> { แบ‹ โˆˆ S ^ p(แบ‹) }

The library translate goals of rules into algebraic queries with an objective to access the minimum of ground facts needed in order to determine the answer. Rules are compiled to composition of ฯƒ functions (sub-queries). They are recursively expanded and the evaluation of the current sub-query is postponed until the new sub-query has been completely solved.

The defined sigma expression formalism translates purely declarative semantic into operational semantic, i.e. specify of query must be executed. The lazy set ensures simplicity of one-tuple-at-a-time evaluation strategy while preserving efficiency of set-oriented methods used by high-level query languages.

The sigma function is the formalism to relate logic program to ground facts persisted by external storage (most common query languages, access methodologies, i/o interfaces). The library uses sigma algebra to evaluate logic program but it requires developers to implement corresponding access protocols supported by external storage. This is an abstraction interface to retrieve ground facts matching predicate. The library hides the concerns of logical program evaluation but provides hooks to implement access protocols.

Getting started

The latest version of the library is available at its master branch. All development, including new features and bug fixes, take place on the master branch using forking and pull requests as described in contribution guidelines.

Installation

The stable library release is available via hex packages, add the library as dependency to rebar.config

{deps, [{datalog}]}.

Usage

The library requires implementation of streaming interface to fetch a ground facts from external storage. It provides a reference implementation to deal with lists.

Let's consider usage example of library using movies dataset and human readable datalog.

Build library and run the development console

make && make run

The typical usage scenario parse, compile and evaluate.

%% parse query
Q = datalog:p("?- h(_, _). f(x,y). h(x,y) :- f(x,y), y > 1.").

%% compile query
E = datalog:c(datalog_list, Q).

%% evaluate query
S = datalog:q(E, [{a, 1}, {b, 2}, {c, 3}]).

%% [ 
%%   [b,2], [c,3] 
%% ]
stream:list(S).

Let's consider a complex scenarios with a reference dataset about movies.

{ok, Imdb} = file:consult("./priv/imdb.config").

Basic queries

Match a person from dataset using query goals

%%
%% define a query goal to match a person with `name` equal to `Ridley Scott`.
%% An identity rule is used to produce stream of tuples 
Q = "?- h(_, \"name\", \"Ridley Scott\"). h(s, p, o) :- f(s, p, o).".

%%
%% parse and compile a query into executable function
F = datalog:c(datalog_list, datalog:p(Q)).

%%
%% apply the function to dataset and materialize a stream of tuple, it returns
%% [
%%    [<<"urn:person:137">>,<<"name">>,<<"Ridley Scott">>]
%% ]
stream:list(F(Imdb)).

Data patterns

Match a person from dataset using patterns: literals and guards

Q = "
   ?- h(_, _). 

   f(s, p, o). 

   h(s, o) :- 
      f(s, \"name\", o), o = \"Ridley Scott\".
".

%%
%% [
%%    [<<"urn:person:137">>,<<"Ridley Scott">>]
%% ]
F = datalog:c(datalog_list, datalog:p(Q)).
stream:list(F(Imdb)).

Discover all movies produces in 1987

Q = "
   ?- h(_, _). 

   f(s, p, o). 

   h(s, title) :- 
      f(s, \"year\", 1987), 
      f(s, \"title\", title).
".

%%
%% [
%%    [<<"urn:movie:202">>,<<"Predator">>],
%%    [<<"urn:movie:203">>,<<"Lethal Weapon">>],
%%    [<<"urn:movie:204">>,<<"RoboCop">>]
%% ]
F = datalog:c(datalog_list, datalog:p(Q)).
stream:list(F(Imdb)).

Discover all actors of "Lethal Weapon" movie.

Q = "
   ?- h(_). 

   f(s, p, o). 

   h(name) :- 
      f(m, \"title\", \"Lethal Weapon\"), 
      f(m, \"cast\", p), 
      f(p, \"name\", name).
".

%%
%% [
%%    [<<"Mel Gibson">>],
%%    [<<"Danny Glover">>],
%%    [<<"Gary Busey">>]
%% ]
F = datalog:c(datalog_list, datalog:p(Q)).
stream:list(F(Imdb)).

Predicates

Discover all movies produced before 1984

Q = "
   ?- h(_, _). 

   f(s, p, o).

   h(title, year) :- 
      f(s, \"year\", year), 
      f(s, \"title\", title), 
      year < 1984.
".

%%
%% [
%%    [<<"First Blood">>,1982],
%%    [<<"Alien">>,1979],
%%    [<<"Mad Max">>,1979],
%%    [<<"Mad Max 2">>,1981]
%% ]
F = datalog:c(datalog_list, datalog:p(Q)).
stream:list(F(Imdb)).

Design custom ฯƒ function

ฯƒ function is a partial application takes terms and side-effect environment, it returns a stream on tuples matching the terms pattern.

As an example, the following sigma function takes two terms (2-arity) and returns corresponding stream of tuples. The library uses list [_] as data structure for tuples. It allows efficiently bind deducted values to term variable. Thus, each sigma function return stream (lazy list) of lists.

f(_, _, [X, Y]) ->
   fun(Env) ->
      stream:build(...)
   end.

Each term takes one of the following types undefined | [filter()] | literal():

  • undefined terms position corresponds to free variable, streams output at this position will be bound to corresponding variable at datalog expression.
  • [filter()] defines acceptable range of term values
  • literal() defines exact matching of term at stream

Reference

  1. What You Always Wanted to Know About Datalog (And Never Dared to Ask)
  2. Theory of Relational Databases
  3. Foundation of Database
  4. Recursive Programming in Datalog
  5. Datalog and Recursive Query Processing
  6. http://ion.uwinnipeg.ca/~ychen2/journalpapers/StratifiedDB.pdf
  7. http://www.cs.toronto.edu/~drosu/csc343-l7-handout6.pdf

License

Copyright 2014 Dmitry Kolesnikov

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0.

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

More Repositories

1

cache

Erlang in-memory cache
Erlang
136
star
2

datum

pure functional and generic programming for Erlang
Erlang
119
star
3

aws-cdk-pure

Purely Functional and high-order cloud components with AWS CDK
TypeScript
89
star
4

esq

simple persistent queues for Erlang
Erlang
54
star
5

typhoon

distributed system stress and load testing tool
Erlang
43
star
6

golem

pure functional and generic programming for Go
Go
40
star
7

serverless

Serverless Erlang runtime for AWS Lambda Service
Erlang
29
star
8

csv

csv parser, optimized for performance
Makefile
22
star
9

dynamo

Generic Golang Key/Value trait for AWS storage services
Go
17
star
10

pts

The library provides hashtable-like interface to manipulate data distributed across Erlang processes.
Erlang
15
star
11

hash

collection of hash functions for Erlang applications
Erlang
12
star
12

feta

fogfish erlang toolkit archive
Erlang
10
star
13

guid

K-ordered unique identifiers in lock-free and decentralised manner for Golang applications
Go
10
star
14

makefile

Erlang workflow
Makefile
10
star
15

oauth2

oauth2 authorization server
Erlang
9
star
16

node-lambda-typescript-template

The bare minimum for a TypeScript app running on Amazon Lambda.
TypeScript
8
star
17

ek

Erlang clustering
Erlang
8
star
18

nebula

Erlang nodes discovery agent
Erlang
8
star
19

ddb

AWS Generic Storage Drivers
Erlang
8
star
20

relog

Redis datalog support
Erlang
8
star
21

m_http

A class of Erlang monads which can do http requests
Erlang
7
star
22

hyperion

Erlang Vanilla Node
Erlang
6
star
23

uid

erlang fault tolerant service to generate unique identities
Erlang
6
star
24

svg

Erlang
5
star
25

gurl

แต๐Ÿ†„๐Ÿ†๐Ÿ…ป is a combinator library for network I/O
Go
5
star
26

blueprint-serverless-golang

AWS CDK template for serverless Golang
Go
5
star
27

word2vec

Golang "native" implementation of word2vec algorithm (word2vec++ port)
C++
5
star
28

semantic

Semantic Web ToolKit for Erlang applications
Erlang
4
star
29

socat

Command line utility to cat files via network socket
Erlang
4
star
30

ring

consistent hashing data structure
Go
4
star
31

elata

Erlang LATency Agent: software solution to gather telemetry of software components (written on erlang)
Shell
4
star
32

certbot-on-aws

Serverless integration with https://letsencrypt.org for microservices hosted on AWS
Makefile
4
star
33

swirl

swirl is the Erlang port of whiskers.js template library.
Erlang
4
star
34

schemaorg

Go types of schema.org ontology
Go
4
star
35

streamfs

Append-only file system access with stream abstraction
Erlang
3
star
36

esh

Erlang to Shell binding
Erlang
3
star
37

crdts

Convergent replicated data type for Erlang
Erlang
3
star
38

curie

The type CURIE (compact URI) for Golang
Go
3
star
39

erlang-in-docker

Erlang/OTP container
Dockerfile
3
star
40

elasticnt

N-triple intake to Elastic Search
Erlang
3
star
41

chronolog

time series database
Erlang
3
star
42

ecsd

AWS ECS microservice supervisor
Erlang
3
star
43

permit

a high-level api for security tokens management.
Erlang
3
star
44

stdio

Creating streams and performing input and output operations on them
Erlang
3
star
45

s3am

Stream I/O to aws s3 buckets
Erlang
3
star
46

gouldian

Go combinator library for building HTTP serverless applications
Go
3
star
47

datomic-aws

AWS Appliance to managed Datomic solutions
Shell
2
star
48

serverless-runtime

Experimental runtime for serverless Erlang application (Use fogfish/serverless) instead
Erlang
2
star
49

swarm

Go channels for distributed queueing and event-driven systems
Go
2
star
50

hnsw

Hierarchical Navigable Small World Graphs
Go
2
star
51

code-build-bot

Serverless CI/CD with AWS CodeBuild
TypeScript
2
star
52

aae

active anti-entropy library
Erlang
2
star
53

schemacli

schema.org ontology command-line
Go
2
star
54

faults

Type safe constructs to annotate Golang errors with the context
Go
2
star
55

d3

direct distributed dets interface
Erlang
2
star
56

dive

ephemeral and persistent b-tree interface
Erlang
1
star
57

esio

HTTP client to Elastic Search for Erlang application.
Erlang
1
star
58

homebrew-qemu-9pfs

Homebrew Tap install QEMU with 9P filesystem
Ruby
1
star
59

geojson

GeoJSON / RFC7946 codec for Golang
Go
1
star
60

kv8.rel

simple erlang benchmark utility
Shell
1
star
61

cryptex

Semi-automatic cipher for Algebraic Data Types in Golang
Go
1
star
62

elasticlog

Elastic Search datalog support
Erlang
1
star
63

pq

Erlang process queues (pool of workers)
Erlang
1
star
64

znap

Build and replay snapshots from your asynchronous event stream(s)
Scala
1
star
65

runscript

tiny wrapper to run script / shell command as root
C
1
star
66

skiplist

Golang SkipList data structure
Go
1
star
67

ambit

Erlang
1
star
68

erlcc

Erlang Code Compile: wrapper of native compile module
Makefile
1
star
69

tf-workspace

A personal workspace for TensorFlow.
Jupyter Notebook
1
star
70

it

Human-friendly unit tests assertions for Go
Go
1
star
71

clue

system and application status repository for Erlang
Erlang
1
star
72

clot

cloud toolkit for Erlang applications
Erlang
1
star
73

hornlog

define and evaluate horn clauses
Makefile
1
star
74

hexer

Hexastore: Sextuple Indexing for Semantic Web Data Management
Go
1
star
75

pns

process namespace
Erlang
1
star