• Stars
    star
    1,770
  • Rank 26,303 (Top 0.6 %)
  • Language
    Java
  • License
    Apache License 2.0
  • Created almost 9 years ago
  • Updated about 1 month ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

A large-scale entity and relation database supporting aggregation of properties

Gaffer

ci codecov Maven Central

Gaffer is a graph database framework. It allows the storage of very large graphs containing rich properties on the nodes and edges. Several storage options are available, including Accumulo and an in-memory Java Map Store.

It is designed to be as flexible, scalable and extensible as possible, allowing for rapid prototyping and transition to production systems.

Gaffer offers:

  • Rapid query across very large numbers of nodes and edges
  • Continual ingest of data at very high data rates, and batch bulk ingest of data via MapReduce or Spark
  • Storage of arbitrary Java objects on the nodes and edges
  • Automatic, user-configurable in-database aggregation of rich statistical properties (e.g. counts, histograms, sketches) on the nodes and edges
  • Versatile query-time summarisation, filtering and transformation of data
  • Fine grained data access controls
  • Hooks to apply policy and compliance rules to queries
  • Automated, rule-based removal of data (typically used to age-off old data)
  • Retrieval of graph data into Apache Spark for fast and flexible analysis
  • A fully-featured REST API

To get going with Gaffer, visit our getting started pages.

Gaffer is under active development. Version 1.0 of Gaffer was released in October 2017.

Getting Started

Try it out

We have a demo available to try that is based around a small uk road use dataset. See the example/road-traffic README to try it out.

Building and Deploying

To build Gaffer run mvn clean install -Pquick in the top-level directory. This will build all of Gaffer's core libraries and some examples of how to load and query data.

See our Store documentation page for a list of available Gaffer Stores to chose from and the relevant documentation for each.

Inclusion in other projects

Gaffer is hosted on Maven Central and can easily be incorporated into your own maven projects.

To use Gaffer from the Java API the only required dependencies are the Gaffer graph module and a store module for the specific database technology used to store the data, e.g. for the Accumulo store:

<dependency>
    <groupId>uk.gov.gchq.gaffer</groupId>
    <artifactId>graph</artifactId>
    <version>${gaffer.version}</version>
</dependency>
<dependency>
    <groupId>uk.gov.gchq.gaffer</groupId>
    <artifactId>accumulo-store</artifactId>
    <version>${gaffer.version}</version>
</dependency>

This will include all other mandatory dependencies. Other (optional) components can be added to your project as required.

Documentation

Our Javadoc can be found here.

We have some user guides in our docs.

Related repositories

The gaffer-tools repository contains useful tools to help work with Gaffer. These include:

  • mini-accumulo-cluster - Allows a mini Accumulo cluster to be spun up for testing purposes
  • performance-testing - Methods of testing the performance of ingest and query operations against a graph
  • python-shell - Allows operations against a graph to be executed from a Python shell
  • random-element-generation - Code to generate large volumes of random graph data

License

Gaffer is licensed under the Apache 2 license and is covered by Crown Copyright.

Copyright 2016-2023 Crown Copyright

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

  http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

Contributing

We welcome contributions to the project. Detailed information on our ways of working can be found here. In brief:

More Repositories

1

CyberChef

The Cyber Swiss Army Knife - a web app for encryption, encoding, compression and data analysis
JavaScript
29,011
star
2

BoilingFrogs

GCHQ's internal Boiling Frogs research paper on software development and organisational change in the face of disruption #boilingfrogs
603
star
3

stroom

Stroom is a highly scalable data storage, processing and analysis platform.
Java
434
star
4

CyberChef-server

A server providing RESTful access to CyberChef
JavaScript
131
star
5

Palisade

A Tool for Complex and Scalable Data Access Policy Enforcement
Batchfile
96
star
6

Bailo

Managing the lifecycle of machine learning to support scalability, impact, collaboration, compliance and sharing.
HTML
78
star
7

sleeper

A cloud-native, serverless, scalable, cheap key-value store
Java
62
star
8

annchor

Fast k-NN graph construction for slow metrics
Python
58
star
9

gaffer-tools

gaffer-tools is deprecated. Use https://github.com/gchq/gafferpy instead
Python
50
star
10

gaffer-docker

Gaffer Docker images and associated Helm charts for deploying on Kubernetes
Shell
30
star
11

stroom-docs

Documentation for Stroom and associated projects
Shell
30
star
12

nix-bootstrap

Easily generate reproducible infrastructure
Haskell
26
star
13

MagmaCore

Magma Core is a collection of Java Classes and utilities to enable HQDM objects and patterns to be created and consumed as RDF Linked Data.
Java
26
star
14

coreax

A library for coreset algorithms, written in Jax for fast execution and GPU support.
Python
25
star
15

event-logging-schema

Event Logging is an XML Schema for describing the auditable events generated by computer systems, hardware devices and access control systems
Shell
25
star
16

synthetic-data-generator

Code for generating synthetic data for testing
Java
21
star
17

koryphe

A flexible library for writing functional operations in Java
Java
20
star
18

event-logging

A Java JAXB library for generating events conforming to the Event Logging XML Schema
Java
15
star
19

CyberChef-web

The Cyber Swiss Army Knife - a web app for encryption, encoding, compression and data analysis
14
star
20

stroom-visualisations-dev

A set of D3 data visualisations for use in Stroom dashboards or other applications
JavaScript
13
star
21

HQDM

Java implementation of the High-Quality Data Model framework.
Java
13
star
22

ConcourseTools

A Python package for easily implementing Concourse resource types.
Python
13
star
23

iris-worm

IRIS Worm is a real-time data graphing component.
JavaScript
12
star
24

LD-Explorer

LD-Explorer is the missing tool for exploring, federating and querying linked data resources directly from the browser
TypeScript
11
star
25

gaffer-doc

Documentation for Gaffer
9
star
26

stroom-content

Content for Stroom such as XML Schemas, translations, pipelines and dashboards
XSLT
9
star
27

stroom-proxy

Acts as a proxy for forwarding and aggregating data en route to Stroom
Java
6
star
28

Kai

Kai is an experimental Graph-as-a-Service framework built with the Amazon CDK
TypeScript
6
star
29

gaffer-experimental

Java
6
star
30

iris-timeline-viewer

An IRIS component for navigating and interacting with time based data.
JavaScript
5
star
31

stroom-resources

Applications and resources on which Stroom services depend
Shell
5
star
32

iris-schedule-viewer

An IRIS component for displaying gantt-style categorised data over time.
JavaScript
5
star
33

gafferpy

Python API for Gaffer
Python
5
star
34

stroom-stats

Java
4
star
35

Palisade-clients

Contains the code for the client libraries for Palisade
Java
4
star
36

Palisade-examples

Contains the various examples for demoing Palisade
Java
4
star
37

stroom-ansible

A home for all stroom related ansible playbooks, roles, etc.
Python
4
star
38

stroom-js

JavaScript
4
star
39

stroom-expression

Java
4
star
40

Palisade-readers

Contains all the implementations for Palisade data reader technologies.
Java
4
star
41

Maestro

A framework for configurable operation executors
JavaScript
4
star
42

stroom-clients

A collection of client libraries to help with sending data to Stroom
Shell
4
star
43

Palisade-services

Contains the service implementations for a Palisade deployment
Java
3
star
44

stroom-auth

Java
3
star
45

hbase-common-shaded

A shaded version of org.apache.hbase:hbase-common: shades Jersey and excludes logging.
3
star
46

stroom-timeline

Disorder in, order out
Java
3
star
47

stroom-ui

The new (for StroomV7) React based user interface for Stroom
TypeScript
3
star
48

stroom-agent

A simple java program that can be used for pulling data (such as log files) from remote hosts and forwarding it to Stroom
Java
3
star
49

hadoop-common-shaded

A shaded version of org.apache.hadoop:hadoop-common: shades Jersey and excludes logging.
Shell
3
star
50

urlDependencies-plugin

A Gradle plugin for retrieving remote dependencies by URL. E.g. from GitHub releases.
Groovy
2
star
51

Palisade-common

Contains the libraries that are common across Palisade services
Java
2
star
52

stroom-query

Java
2
star
53

stroom-headless

An example of how to run Stroom processing from the command line
Shell
2
star
54

stroom-shaded-dependencies

2
star
55

stroom-timeline-loader

Java
2
star
56

stroom-data-generator

Utility for generating data suitable for testing Stroom
Java
2
star
57

hadoop-hdfs-shaded

A shaded version of org.apache.hadoop:hadoop-hdfs: shades Jersey and excludes logging.
Shell
2
star
58

event-logging-proto-schema

C++
1
star
59

stroom-annotations

Annotations Service for Stroom Dashboards
Java
1
star
60

stroom-test-data

Generic library for generating test data with configurable fields, formats and outputs
Shell
1
star