• Stars
    star
    102
  • Rank 335,584 (Top 7 %)
  • Language
    Go
  • License
    MIT License
  • Created over 10 years ago
  • Updated over 4 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

You personal database. Mirror of https://git.sr.ht/~tsileo/blobstash

BlobStash

microblog.pub

builds.sr.ht status    License

Your personal database.

Still in early development.

Manifesto

BlobStash is primarily a database, you can store raw blobs, key-value pairs, JSON documents and files/directories.

It can also acts as a web server/reverse proxy.

The web server supports HTTP/2 and can generate you TLS certs on the fly using Let's Encrypt. You can proxy other applications and gives them free certs at the same time, you can also write apps (using Lua) that lets you interact with BlobStash's database. Hosting static content is also an option. It let you easily add authentication to any app/proxied service.

Blobs

The content-addressed blob store (the identifier of a blob is its own hash, the chosen hash function is BLAKE2b) is at the heart of everything in BlobStash. Everything permanently stored in BlobStash ends up in a blob.

BlobStash has its own storage engine: BlobsFile, data is stored in an append-only flat file. All data is immutable, stored with error correcting code for bit-rot protection, and indexed in a temporary index for fast access, only 2 seeks operations are needed to access any blobs.

The blob store supports real-time replication via an Oplog (powered by Server-Sent Events) to replicate to another BlobStash instance (or any system), and also support efficient synchronisation between instances using a Merkle tree to speed-up operations.

Key-values

Key-value pairs lets you keep a mutable reference to an internal or external object, it can be a hash and/or any sequence of bytes.

Each key-value has a timestamp associated, its version. you can easily list all the versions, by default, the latest version is returned. Internally, each "version" is stored as a separate blob, with a specific format, so it can be detected and re-indexed.

Key-Values are indexed in a temporary database (that can be rebuilt at any time by scanning all the blobs) and stored as a blob.

Files, tree of files

Files and tree of files are first-class citizen in BlobStash.

Files are split in multiple chunks (stored as blobs, using content-defined chunking, giving deduplication at the file level), and everything is stored in a kind of Merkle tree where the hash of the JSON file containing the file metadata is the final identifier (which will also be stored as blob).

The JSON format also allow to model directory. A regular HTTP multipart endpoint can convert file to BlobStash internal format for you, or you can do it locally to prevent sending blobs that are already present.

Files can be streamed easily, range requests are supported, EXIF metadata automatically extracted and served, and on-the-fly resizing (with caching) for images.

You can also enable a S3 compatible gateway to manage your files.

Role Based Access Control (RBAC)

BlobStash features fine-grained permissions support, with a model similar to AWS roles.

Predefined roles

  • admin: full access to everything
    • action:*/resource:*

Document Store

The Document Store stores JSON documents, think MongoDB or CouchDB, and exposes it over an HTTP API.

Documents are stored in a collection. All collections are stored in a single namespace.

Every document versions is kept (and always accessible via temporal queries, i.e. querying the state of a collection at an instant t).

The Document Store supports ETag, conditional requests (If-Match...) and JSON Patch for partial/consistent update.

Documents are queried with Lua functions, like:

local docstore = require('docstore')
return function(doc)
  if doc.subdoc.counter > 10 and docstore.text_search(doc, "query", {"content"}) then
    return true
  end
  return false
end

It also implements a basic MapReduce framework (Lua powered too).

And lastly, a document can hold pointers to filse/nodes stored in the FileTree Store.

Internally, a JSON document "version" is stored as a "versioned key-value" entry. Document IDs encode the creation version, and are lexicographically sorted by creation date (8 bytes nano timestamp + 4 random bytes). The Versioned Key-Value Store is the default index for listing/sorting documents.

Collections

GET /api/docstore

List all the collections.

HTTP Request
$ http --auth :apikey GET https://instance.com/api/docstore
HTTP Response
{
    "data": [
        "mycollection"
    ], 
    "pagination": {
        "count": 1, 
        "cursor": "", 
        "has_more": false, 
        "per_page": 50
    }
}
blobstash-python
from blobstash.docstore import DocStoreClient

client = DocStoreClient("https://instance.com", api_key="apikey")

client.collections()
# [blobstash.docstore.Collection(name='mycollection')]

Inserting documents

Collections are created on-the-fly when a document is inserted.

POST /api/docstore/{collection}

HTTP Request
$ http --auth :apikey post https://instance.com/api/docstore/{collection} content=lol
HTTP Response
{
    "_created": "2020-02-23T15:28:06Z", 
    "_id": "15f6119d6dddd68fa986d4c7", 
    "_version": "1582471686918100623"
}
blobstash-python
from blobstash.docstore import DocStoreClient

client = DocStoreClient("https://instance.com", api_key="apikey")

# or `client["mycol"]` or `client.collection("mycol")`
col = client.mycol

doc = {"content": "lol"}

col.insert(doc)
# blobstash.docstore.ID(_id='15f611f032ae804d668dd855')

# the `dict` will be updated with its `_id`
doc
# {'content': 'lol',
#  '_id': blobstash.docstore.ID(_id='15f611f032ae804d668dd855')}

Updating a document (by replacing it)

POST /api/docstore/{collection}/{id}

HTTP Request
$ http --auth :apikey post https://instance.com/api/docstore/{collection} content=lol
HTTP Response
{
    "_created": "2020-02-23T15:28:06Z", 
    "_id": "15f6119d6dddd68fa986d4c7", 
    "_version": "1582471686918100623"
}

PATCH /api/docstore/{collection}/{id}

HTTP Request
HTTP Response
blobstash-python

Deleting documents

DELETE /api/docstore/{collection}/{id}

HTTP Request
$ http --auth :apikey delete https://instance.com/api/docstore/{collection}/{id}
HTTP Response

204 no content.

blobstash-python
from blobstash.docstore import DocStoreClient

client = DocStoreClient("https://instance.com", api_key="apikey")

# or `client["mycol"]` or `client.collection("mycol")`
col = client.mycol

# Can take an ID as `str`, an `ID` object, or a document (with the `_id` key)
col.delete("15f611f032ae804d668dd855")

Retrieving documents

Querying documents

GET /api/docstore/{collection}{?sort_index,as_of}

HTTP Request
$ http --auth :apikey get https://instance.com/api/docstore/{collection}
HTTP Response
{
    "data": [
        {
            "_created": "2020-02-23T15:50:24Z", 
            "_id": "15f612d4f7715bdb28c93fd9", 
            "_updated": "2020-02-23T15:55:15Z", 
            "_version": "1582473315736447008", 
            "content": "lol2"
        }
    ], 
    "pagination": {
        "count": 1, 
        "cursor": "ZG9jc3RvcmU6Y29sMToxNWY2MTJkNGY3NzE1YmRiMjhjOTNmZDg=", 
        "has_more": false, 
        "per_page": 50
    }, 
    "pointers": {}
}
blobstash-python
from blobstash.docstore import DocStoreClient

client = DocStoreClient("https://instance.com", api_key="apikey")

# or `client["mycol"]` or `client.collection("mycol")`
col = client.mycol

col.query()
#

Sorting/indexes

Sorting can only be done through indexes.

MapReduce framework

BlobStash Use Cases

Backups from external servers

Setup an API key with limited permissions (in blobstash.yaml), just enough to save a snapshot of a tree:

# [...]
auth:
 - id: 'my_backup_key'
   password: 'my_api_key'
   roles: 'backup_server1'
roles:
 - name: 'backup_server1'
   perms:
    - action: 'action:stat:blob'
      resource: 'resource:blobstore:blob:*'
    - action: 'action:write:blob'
      resource: 'resource:blobstore:blob:*'
    - action: 'action:snapshot:fs'
      resource: 'resource:filetree:fs:server1'
    - action: 'action:write:kv'
      resource: 'resource:kvstore:kv:_filetree:fs:server1'
    - action: 'action:gc:namespace'
      resource: 'resource:stash:namespace:server1'

Then on "server1":

$ export BLOBS_API_HOST=https://my-blobstash-instance.com BLOBS_API_KEY=my_api_key
$ blobstash-uploader server1 /path/to/data

Lua API

Extra module

extra.glob(pattern, name)

Parses the shell file name pattern/glob and reports wether the file name matches.

Uses go's filepath.Match.

Attributes

Name Type Description
pattern String Glob pattern
name String file name

Returns

Boolean

Contribution

Pull requests are welcome but open an issue to start a discussion before starting something consequent.

Feel free to open an issue if you have any ideas/suggestions!

License

Copyright (c) 2014-2018 Thomas Sileo and contributors. Released under the MIT license.

More Repositories

1

microblog.pub

A self-hosted, single-user, ActivityPub powered microblog.
Python
1,102
star
2

bakthat

Bakthat is a MIT licensed backup framework written in Python, it's both a command line tool and a Python module that helps you manage backups on Amazon S3/Glacier and OpenStack Swift. It automatically compress, encrypt (symmetric encryption) and upload your files.
Python
491
star
3

btcplex

BTCplex is an open source Bitcoin block chain browser written in Go, it allows you to search and navigate the block chain.
Go
104
star
4

little-boxes

Tiny ActivityPub framework written in Python, both database and server agnostic.
Python
82
star
5

dirtools

Exclude/ignore files in a directory (using .gitignore like syntax), compute hash, search projects for an entire directory tree, gzip compression and track changes in a directory over time.
Python
74
star
6

flask-yeoman

A Flask blueprint to make create web application using Yeoman and Flask an easy task.
Python
51
star
7

txwatcher

A little Python utility that lets you monitor Bitcoin addresses through Blockchain Websocket API and perform custom callbacks.
Python
49
star
8

blobsnap

BlobSnap: a snapshot-based backup system designed to provide "time machine" like features.
Go
47
star
9

pycgminer

Python wrapper for cgminer RPC API.
Python
41
star
10

incremental-backups-tools

Storage agnostic incremental backups tools, building blocks for creating incremental backups utilities.
Python
28
star
11

embedded-js-widget

Building an embedded widget using RequireJS and Ractive.js.
CSS
27
star
12

cube-client

A Python client for Cube: Time Series Data Collection & Analysis
Python
26
star
13

eve-mocker

Mocking tool for Eve powered REST API.
Python
21
star
14

camlipy

Unofficial Python client for Camlistore.
C
19
star
15

blobsfile

BlobStash's storage engine. Mirror of https://git.sr.ht/~tsileo/blobsfile
Go
16
star
16

defender

Golang middleware to prevent brute force attacks
Go
16
star
17

entries.pub

WIP IndieWeb blog engine
OCaml
14
star
18

blobfs

New project: https://git.sr.ht/~tsileo/blobfs
Go
12
star
19

indieauth

Implements an IndieAuth (an identity layer on top of OAuth 2.0) client/authentication middleware.
Go
12
star
20

rigsmonitoring

Self-hosted monitoring dashboard for your mining rigs written in Python.
JavaScript
10
star
21

blkparser

Open source Bitcoin block chain parser written in Go.
Go
9
star
22

gluapp

HTTP framework for GopherLua.
Go
9
star
23

poussetaches

Lightweight asynchronous task execution service
Go
9
star
24

objets

Objets is an object storage server (using a directory as backend) with a AWS S3 compatible API.
Go
8
star
25

are-you-tracking-me

Open-source Android application (written in Kotlin) that let you send your GPS location to your own server at a regular interval.
Kotlin
6
star
26

pyblinkm

Drive a BlinkM with Python via I2C using python-smbus on Raspberry Pi.
Python
5
star
27

s3layer

S3 compatibility layer for custom data sources.
Go
3
star
28

ts4

A content-addressed blob store backed by S3, indexed by upload time (in SimpleDB) accessible via a simple HTTP API.
Go
3
star
29

globster

Tools for converting globs to regular expressions.
Python
3
star
30

yammpress

Python
2
star
31

blobs

Blobs mobile app (built with Flutter)
Dart
2
star
32

blobstash-python

Python client for BlobStash. Mirror of https://git.sr.ht/~tsileo/blobstash-python
Python
2
star
33

broxy

The most friendly proxy ever!
Go
2
star
34

rawgeo

Building block for geohash based spatial indexes
Go
2
star
35

jquery-hawk-ajax

A jQuery plugin to use the hawk HTTP authentication scheme with $.ajax
JavaScript
2
star
36

ginette

Ginette is my personal always-on voice assistant powered by Python.
Python
2
star
37

bakthat-syncserver

Draft of a synchronization server for bakthat.
Python
1
star
38

blckchn

(obsolete)
Python
1
star
39

rightnow

Python
1
star
40

misc

Misc stuff that aren't worth a repo.
Lua
1
star
41

wesh

A Bonjour name server for your local network
Python
1
star
42

went-there

Lua application built to run on ge0 that I use to track my locations using Are You Tracking Me?.
Lua
1
star
43

blobpad

BlobPad is a note taking application build on top of BlobStash.
JavaScript
1
star
44

gemapi

Gemapi is a lightweight Gemini framework.
Python
1
star