• This repository has been archived on 16/Feb/2022
  • Stars
    star
    1,656
  • Rank 27,474 (Top 0.6 %)
  • Language
    Go
  • License
    Apache License 2.0
  • Created about 5 years ago
  • Updated almost 5 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

A distributed knowledge graph store

Akutan

Build Status GoDoc

There's a blog post that's a good introduction to Akutan.

Akutan is a distributed knowledge graph store, sometimes called an RDF store or a triple store. Knowledge graphs are suitable for modeling data that is highly interconnected by many types of relationships, like encyclopedic information about the world. A knowledge graph store enables rich queries on its data, which can be used to power real-time interfaces, to complement machine learning applications, and to make sense of new, unstructured information in the context of the existing knowledge.

How to model your data as a knowledge graph and how to query it will feel a bit different for people coming from SQL, NoSQL, and property graph stores. In a knowledge graph, data is represented as a single table of facts, where each fact has a subject, predicate, and object. This representation enables the store to sift through the data for complex queries and to apply inference rules that raise the level of abstraction. Here's an example of a tiny graph:

subject predicate object
<John_Scalzi> <born> <Fairfield>
<John_Scalzi> <lives> <Bradford>
<John_Scalzi> <wrote> <Old_Mans_War>

To learn about how to represent and query data in Akutan, see docs/query.md.

Akutan is designed to store large graphs that cannot fit on a single server. It's scalable in how much data it can store and the rate of queries it can execute. However, Akutan serializes all changes to the graph through a central log, which fundamentally limits the total rate of change. The rate of change won't improve with a larger number of servers, but a typical deployment should be able to handle tens of thousands of changes per second. In exchange for this limitation, Akutan's architecture is a relatively simple one that enables many features. For example, Akutan supports transactional updates and historical global snapshots. We believe this trade-off is suitable for most knowledge graph use cases, which accumulate large amounts of data but do so at a modest pace. To learn more about Akutan's architecture and this trade-off, see docs/central_log_arch.md.

Akutan isn't ready for production-critical deployments, but it's useful today for some use cases. We've run a 20-server deployment of Akutan for development purposes and off-line use cases for about a year, which we've most commonly loaded with a dataset of about 2.5 billion facts. We believe Akutan's current capabilities exceed this capacity and scale; we haven't yet pushed Akutan to its limits. The project has a good architectural foundation on which additional features can be built and higher performance could be achieved.

Akutan needs more love before it can be used for production-critical deployments. Much of Akutan's code consists of high-quality, documented, unit-tested modules, but some areas of the code base are inherited from Akutan's earlier prototype days and still need attention. In other places, some functionality is lacking before Akutan could be used as a critical production data store, including deletion of facts, backup/restore, and automated cluster management. We have filed GitHub issues for these and a few other things. There are also areas where Akutan could be improved that wouldn't necessarily block production usage. For example, Akutan's query language is not quite compatible with Sparql, and its inference engine is limited.

So, Akutan has a nice foundation and may be useful to some people, but it also needs additional love. If that's not for you, here are a few alternative open-source knowledge and property graph stores that you may want to consider (we have no affiliation with these projects):

  • Blazegraph: an RDF store. Supports several query languages, including SPARQL and Gremlin. Disk-based, single-master, scales out for reads only. Seems unmaintained. Powers https://query.wikidata.org/.
  • Dgraph: a triple-oriented property graph store. GraphQL-like query language, no support for SPARQL. Disk-based, scales out.
  • Neo4j: a property graph store. Cypher query language, no support for SPARQL. Single-master, scales out for reads only.
  • See also Wikipedia's Comparison of Triplestores page.

The remainder of this README describes how to get Akutan up and running. Several documents under the docs/ directory describe aspects of Akutan in more detail; see docs/README.md for an overview.

Installing dependencies and building Akutan

Akutan has the following system dependencies:

  • It's written in Go. You'll need v1.11.5 or newer.
  • Akutan uses Protocol Buffers extensively to encode messages for gRPC, the log of data changes, and storage on disk. You'll need protobuf version 3. We reccomend 3.5.2 or later. Note that 3.0.x is the default in many Linux distributions, but doesn't work with the Akutan build.
  • Akutan's Disk Views store their facts in RocksDB.

On Mac OS X, these can all be installed via Homebrew:

$ brew install golang protobuf rocksdb zstd

On Ubuntu, refer to the files within the docker/ directory for package names to use with apt-get.

After cloning the Akutan repository, pull down several Go libraries and additional Go tools:

$ make get

Finally, build the project:

$ make build

Running Akutan locally

The fastest way to run Akutan locally is to launch the in-memory log store:

$ bin/plank

Then open another terminal and run:

$ make run

This will bring up several Akutan servers locally. It starts an API server that listens on localhost for gRPC requests on port 9987 and for HTTP requests on port 9988, such as http://localhost:9988/stats.txt.

The easiest way to interact with the API server is using bin/akutan-client. See docs/query.md for examples. The API server exposes the FactStore gRPC service defined in proto/api/akutan_api.proto.

Deployment concerns

The log

Earlier, we used bin/plank as a log store, but this is unsuitable for real usage! Plank is in-memory only, isn't replicated, and by default, it only keeps 1000 entries at a time. It's only meant for development.

Akutan also supports using Apache Kafka as its log store. This is recommended over Plank for any deployment. To use Kafka, follow the Kafka quick start guide to install Kafka, start ZooKeeper, and start Kafka. Then create a topic called "akutan" (not "test" as in the Kafka guide) with partitions set to 1. You'll want to configure Kafka to synchronously write entries to disk.

To use Kafka with Akutan, set the akutanLog's type to kafka in your Akutan configuration (default: local/config.json), and update the locator's addresses accordingly (Kafka uses port 9092 by default). You'll need to clear out Akutan's Disk Views' data before restarting the cluster. The Disk Views by default store their data in $TMPDIR/rocksdb-akutan-diskview-{space}-{partition} so you can delete them all with rm -rf $TMPDIR/rocksdb-akutan-diskview*

Docker and Kubernetes

This repository includes support for running Akutan inside Docker and Minikube. These environments can be tedious for development purposes, but they're useful as a step towards a modern and robust production deployment.

See cluster/k8s/Minikube.md file for the steps to build and deploy Akutan services in Minikube. It also includes the steps to build the Docker images.

Distributed tracing

Akutan generates distributed OpenTracing traces for use with Jaeger. To try it, follow the Jaeger Getting Started Guide for running the all-in-one Docker image. The default make run is configured to send traces there, which you can query at http://localhost:16686. The Minikube cluster also includes a Jaeger all-in-one instance.

Development

VS Code

You can use whichever editor you'd like, but this repository contains some configuration for VS Code. We suggest the following extensions:

Override the default settings in .vscode/settings.json with ./vscode-settings.json5.

Test targets

The Makefile contains various targets related to running tests:

Target Description
make test run all the akutan unit tests
make cover run all the akutan unit tests and open the web-based coverage viewer
make lint run basic code linting
make vet run all static analysis tests including linting and formatting

License Information

Copyright 2019 eBay Inc.

Primary authors: Simon Fell, Diego Ongaro, Raymond Kroeker, Sathish Kandasamy

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at https://www.apache.org/licenses/LICENSE-2.0.

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.


Note the project was renamed to Akutan in July 2019.

More Repositories

1

NMessenger

A fast, lightweight messenger component built on AsyncDisplaykit and written in Swift
Swift
2,424
star
2

nice-modal-react

A modal state manager for React.
TypeScript
1,947
star
3

tsv-utils

eBay's TSV Utilities: Command line tools for large, tabular data files. Filtering, statistics, sampling, joins and more.
D
1,413
star
4

bayesian-belief-networks

Pythonic Bayesian Belief Network Package, supporting creation of and exact inference on Bayesian Belief Networks specified as pure python functions.
Python
1,122
star
5

NuRaft

C++ implementation of Raft core logic as a replication library
C++
962
star
6

restcommander

Fast Parallel Async HTTP client as a Service to monitor and manage 10,000 web servers. (Java+Akka)
Java
899
star
7

parallec

Fast Parallel Async HTTP/SSH/TCP/UDP/Ping Client Java Library. Aggregate 100,000 APIs & send anywhere in 20 lines of code. Ping/HTTP Calls 8000 servers in 12 seconds. (Akka) www.parallec.io
Java
810
star
8

HeadGazeLib

A library to empower iOS app control through head gaze without a finger touch
Swift
754
star
9

Sequence-Semantic-Embedding

Tools and recipes to train deep learning models and build services for NLP tasks such as text classification, semantic search ranking and recall fetching, cross-lingual information retrieval, and question answering etc.
Python
459
star
10

modanet

ModaNet: A large-scale street fashion dataset with polygon annotations
327
star
11

flutter_glove_box

Various eBay tools for Flutter development
Dart
316
star
12

Neutrino

Neutrino is a software load balancer(SLB)
Scala
306
star
13

KPRN

Reasoning Over Knowledge Graph Paths for Recommendation
Lua
279
star
14

UAF

UAF - Universal Authentication Framework
Java
276
star
15

griffin

Model driven data quality service
JavaScript
240
star
16

cors-filter

CORS (Cross Origin Resource Sharing) is a mechanism supported by W3C to enable cross origin requests in web-browsers. CORS requires support from both browser and server to work. This is a Java servlet filter implementation of server-side CORS for web containers such as Apache Tomcat.
Java
231
star
17

Jungle

An embedded key-value store library specialized for building state machine and log store
C++
218
star
18

ebayui-core

Collection of Marko widgets; considered to be the core building blocks for all eBay components, pages & apps
TypeScript
209
star
19

sbom-scorecard

Generate a score for your sbom to understand if it will actually be useful.
Go
208
star
20

jsonpipe

A lightweight AJAX client for chunked JSON responses
JavaScript
204
star
21

ebay-font

A small utility to efficiently load custom web fonts
JavaScript
175
star
22

skin

Pure CSS framework designed & developed by eBay for a branded, e-commerce marketplace.
JavaScript
171
star
23

accelerator

The Accelerator is a tool for fast and reproducible processing of large amounts of data.
Python
150
star
24

firebase-remote-config-monitor

Monitors firebase remote config values, posting changes to slack
JavaScript
136
star
25

maxDNN

High Efficiency Convolution Kernel for Maxwell GPU Architecture
C++
132
star
26

go-ovn

A Go library for OVN Northbound/Southbound DB access using native OVSDB protocol
Go
107
star
27

Gringofts

Gringofts makes it easy to build a replicated, fault-tolerant, high throughput and distributed event-sourced system.
C++
102
star
28

parallec-samples

Single file examples and ready-to-use servers show how to use parallec.io library. Examples to aggregate APIs and publish to Elastic Search and Kafka, and many more. www.parallec.io
Java
92
star
29

userscript-proxy

HTTP proxy to inject scripts and stylesheets into existing sites.
JavaScript
84
star
30

xcelite

Java
81
star
31

mindpatterns

HTML Accessibility Pattern Examples
HTML
79
star
32

embedded-druid

Java
75
star
33

figma-include-accessibility-annotations

Include is a tool built to make annotating for accessibility (a11y) easier—easier for designers to spec and easier for developers to understand what is required.
JavaScript
73
star
34

RANSynCoders

Jupyter Notebook
72
star
35

ebay-oauth-python-client

Python OAuth SDK: Get OAuth tokens for eBay public APIs
Python
69
star
36

Design-Grid-Overlay

A Chrome extension to overlay a design grid on your web page; configurable to fit many design scenarios.
JavaScript
65
star
37

ebay-oauth-nodejs-client

🔑 Generate an OAuth token that can be used to call the eBay Developer REST APIs.
JavaScript
61
star
38

json-comparison

Powerful JSON comparison tool for identifying all the changes within JSON files
Java
60
star
39

xFraud

Jupyter Notebook
60
star
40

bascomtask

Lightweight parallel Java tasks
Java
59
star
41

DASTProxy

Java
57
star
42

jsonex

Java Object Serializer and Deserializer to JSON Format. Focuses on configuration friendliness, arbitrary object serialization and compact JSON format
Java
56
star
43

ebay-oauth-csharp-client

eBay OAuth C# Client Library
C#
53
star
44

nvidiagpubeat

nvidiagpubeat is an elastic beat that uses NVIDIA System Management Interface (nvidia-smi) to monitor NVIDIA GPU devices and can ingest metrics into Elastic search cluster, with support for both 6.x and 7.x versions of beats. nvidia-smi is a command line utility, based on top of the NVIDIA Management Library (NVML), intended to aid in the management and monitoring of NVIDIA GPU devices.
Go
53
star
45

nice-dag

nice-dag is a lightweight javascript library, which is used to present a DAG diagram.
TypeScript
47
star
46

SparkChamber

An event tracking framework for iOS
Swift
45
star
47

ebay-oauth-java-client

eBay OAuth APIs client for Java
Java
45
star
48

Winder

Winder is a simple state machine based on Quartz Scheduler. It helps to write multiple steps tasks on Quartz Scheduler. Winder derived from a state machine which is widly used in eBay Cloud. eBay Platform As A Service(PaaS) uses it to deploy software to hundreds of thousands virtual machines.
Java
45
star
49

AutoOpt

Automatic and Simultaneous Adjustment of Learning Rate and Momentum for Stochastic Gradient Descent
Python
44
star
50

GZinga

Java
43
star
51

YiDB

Java
43
star
52

collectbeat

Beats with discovery capabilities for environments like Kubernetes
Go
41
star
53

block-aggregator

C++
40
star
54

Jenkins-Pipeline-Utils

Global Jenkins Pipeline Library with common utilities.
Groovy
39
star
55

cassandra-river

Cassandra river for Elastic search.
Java
38
star
56

bsonpatch

A BSON implementation of RFC 6902 to compute the difference between two BSON documents
Java
38
star
57

arc

adaptive resources and components
JavaScript
35
star
58

regressr

A command line regression testing framework for testing HTTP services
Scala
34
star
59

ebashlib

A bash script battery which gathers several generic helper scripts for other repositories.
Shell
30
star
60

modshot

Takes screenshot of UI modules and compare with baselines using PhantomCSS
JavaScript
29
star
61

visual-html

Visual regression testing without the flakiness.
TypeScript
29
star
62

FeedSDK-Python

eBay Python Feed SDK - SDK for downloading large gzipped (tsv) item feed files and applying filters for curation
Python
29
star
63

accessibility-ruleset-runner

eBay Accessibility Ruleset Runner automates 20% of WCAG 2.0 AA recommendations, saving time on manual testing.
JavaScript
27
star
64

crossdomain-xhr

JavaScript
27
star
65

oink

REST based interface for PIG execution
Java
27
star
66

bonsai

open source version of the Bonsai library
Scala
26
star
67

ebayui-core-react

eBayUI React components
TypeScript
25
star
68

geosense

Self-contained jar to lookup timezone by lat+lon
Java
25
star
69

browser-telemetry

A Telemetry module for collecting errors, logs, metrics, uncaught exceptions etc on browser side.
JavaScript
25
star
70

oja

Lightweight Dependency Injection Framework for Node.JS Apps - Structure your application business logic
JavaScript
25
star
71

SketchSVG

Have icons in a Sketch file but don't want to manually extract and compress them as SVGs? Let our SketchSVG tool do it!
JavaScript
25
star
72

CustomRippleView

The Custom Ripple View library provides Android developers an easy way to customize and implement a Ripple Effect view.
Kotlin
24
star
73

FGrav

Dynamic Flame Graph Visualizations from raw data in your browser
JavaScript
24
star
74

nodash

Lightweight replacement for subset of Lodash
JavaScript
24
star
75

FeedSDK

Java SDK for downloading large gzipped (tsv) item feed files and applying filters for curation
Java
23
star
76

kube-credentials-plugin

A Jenkins plugin to store credentials in kubernetes
Java
21
star
77

releaser

A declarative API that syncs specs from git to kubernetes
Go
20
star
78

airflow-rest-api-plugin

A plugin of Apache Airflow that exposes REST endpoints for custom REST APIs.
Python
20
star
79

mtdtool

The Manual Test Demultiplexer is a desktop app (Mac and Windows) that provides an interface for driving manual testing on multiple physical devices.
Java
20
star
80

EBNObservable

A block-based Key-Value Observing (KVO) implementation with observable collections.
Objective-C
19
star
81

nice-form-react

A meta based form builder for React.
TypeScript
18
star
82

skin-react

Skin components built with React (Typescript)
TypeScript
18
star
83

accelerator-project_skeleton

Python
18
star
84

taxonomy-sdk

An SDK designed to bring transparency to the rapid evolution of our aspects metadata for our partners.
Java
18
star
85

wextracto

Python
17
star
86

HomeStore

Storage Engine for block and key/value stores.
C++
17
star
87

myriad

Java
17
star
88

event-notification-nodejs-sdk

NodeJS SDK designed to simplify processing of eBay notifications.
JavaScript
17
star
89

TDD-Albums

A Hands-On Tutorial for iPhone Developers Learning TDD
17
star
90

ebay-oauth-android-client

eBay OAuth Android Client library
Kotlin
16
star
91

fluid

Fluid Web Components
JavaScript
16
star
92

ostara

Java
16
star
93

lightning

Lightning is a Java based, super fast, multi-mode, asynchronous, and distributed URL execution engine from eBay
HTML
16
star
94

RTran

Road to Continous Upgrade
Scala
15
star
95

NautilusTelemetry

An iOS implementation of OpenTelemetry
Swift
15
star
96

hadoop-tsdb-connector

Java
15
star
97

Pine

Pine: Machine Learning Prediction As A Service
Scala
15
star
98

pynetforce

Network infrastructure automation service
Python
15
star
99

Vivid

A visual testing tool to compare two web pages visually and generate the pixel difference they have.
JavaScript
14
star
100

sisl

High Performance C++ data structures and utilities
C++
14
star