• Stars
    star
    150
  • Rank 247,323 (Top 5 %)
  • Language
    Go
  • License
    GNU General Publi...
  • Created over 8 years ago
  • Updated almost 5 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Simple Object Storage (I wish I could call it Steve's Simple Storage, or S3 ;)

Go Report Card license Release

Simple Object Storage

This Simple Object Storage (SOS) project is a HTTP-based object-storage system which allows files to be uploaded, and later retrieved via HTTP.

Files can be replicated across a number of hosts to ensure redundancy, and increased availability in the event of hardware failure.

Installation

There are two ways to install this project from source, which depend on the version of the go version you're using.

If you just need the binaries you can find them upon the project release page.

Source Installation go <= 1.11

If you're using go before 1.11 then the following command should fetch/update the project and install it upon your system:

 $ go get -u github.com/skx/sos

Source installation go >= 1.12

If you're using a more recent version of go (which is highly recommended), you need to clone to a directory which is not present upon your GOPATH:

git clone https://github.com/skx/sos
cd sos
go install

Overview

You can read the design overview for more details, but the core idea behind the implmentation relies upon the notion of a "blob server" - which is a very simple service which provides only the following simple primitives:

  • Store a particular chunk of binary data with a specific name.
  • Given a name retrieve the chunk of binary data associated with it.
  • Return a list of all known names.

The public API is built upon the top of that primitive, and both are launched via the same command sos, by specifying the sub-command to use:

 $ ./sos blob-server ...
 $ ./sos api-server ...

Here the first command launches a blob-server, which is the back-end for storage, and the second command launches the public API server - which is what your code/users should operate against.

If you launch sos with no arguments you'll see brief details of the available subcommands.

Quick Start

In an ideal deployment at least two hosts would be used:

  • One host would run the public-server.
    • This allows uploads to be made, and later retrieved.
  • Each of the two hosts would also run a blob-server.
    • The blob-servers provide the actual storage of the uploaded-objects.
    • The contents of these are replicated out of band.

We can simulate a deployment upon a single host for the purposes of testing. You'll just need to make sure you have four terminals open to run the appropriate daemons.

First of all you'll want to launch a pair of blob-servers:

$ sos blob-server -store data1 -port 4001
$ sos blob-server -store data2 -port 4002

NOTE: The storage-paths (./data1 and ./data2 in the example above) is where the uploaded-content will be stored. These directories will be created if missing.

In production usage you'd generally record the names of the blob-servers in a configuration file, either /etc/sos.conf, or ~/.sos.conf, however they may also be specified upon the command line.

We'll then start the public/API-server ensuring that it knows about the blob-servers to store content in:

$ sos api-server -blob-server http://localhost:4001,http://localhost:4002
Launching API-server
..

Now you, or your code, can connect to the server and start uploading/downloading objects. By default the following ports will be used by the sos-server:

service port
upload service 9991
download service 9992

Providing you've started all three daemons you can now perform a test upload with curl:

$ curl -X POST --data-binary @/etc/passwd http://localhost:9991/upload
{"id":"cd5bd649c4dc46b0bbdf8c94ee53c1198780e430","size":2306,"status":"OK"}

If all goes well you'll receive a JSON-response as shown, and you can use the ID which is returned to retrieve your object:

$ curl http://localhost:9992/fetch/cd5bd649c4dc46b0bbdf8c94ee53c1198780e430
..
$

NOTE: The download service runs on a different port. This is so that you can make policy decisions about uploads/downloads via your local firewall.

At the point you run the upload the contents will only be present on one of the blob-servers, chosen at random. To ensure your data is replicated you need to (regularly) launch the replication utility:

$ sos replicate -blob-server http://localhost:4001,http://localhost:4002 --verbose
group - server
   default - http://localhost:4001
   default - http://localhost:4002
Syncing group: default
   Group member: http://localhost:4001
   Group member: http://localhost:4002
   Object cd5bd649c4dc46b0bbdf8c94ee53c1198780e430 is missing on http://localhost:4001
     Mirroring cd5bd649c4dc46b0bbdf8c94ee53c1198780e430 from http://localhost:4002 to http://localhost:4001
        Fetching :http://localhost:4002/blob/cd5bd649c4dc46b0bbdf8c94ee53c1198780e430
        Uploading :http://localhost:4001/blob/cd5bd649c4dc46b0bbdf8c94ee53c1198780e430

Meta-Data

When uploading objects it is often useful to store meta-data, such as the original name of the uploaded object, the owner, or some similar data. For that reason any header you add to your upload with an X-prefix will be stored and returned on download.

As a special case the header X-Mime-Type can be used to set the returned Content-Type header too.

For example uploading an image might look like this:

$ curl -X POST -H "X-Orig-Filename: steve.jpg" \
               -H "X-MIME-Type: image/jpeg" \
               --data-binary @/home/skx/Images/tmp/steve.jpg \
        http://localhost:9991/upload
{"id":"20b30df22469e6d7617c7da6a457d4e384945a06","status":"OK","size":17599}

Downloading will result in the headers being set:

$ curl -v http://localhost:9992/fetch/20b30df22469e6d7617c7da6a457d4e384945a06 >/dev/null
..
< HTTP/1.1 200 OK
< X-Orig-Filename: steve.jpg
< Date: Fri, 27 May 2016 06:17:39 GMT
< Content-Type: image/jpeg
< Transfer-Encoding: chunked
<
{ [data not shown]

Production Usage

  • The API service must be visible to clients, to allow downloads to be made.

    • Because the download service runs on port 9992 it is assumed that corporate firewalls would deny access.
    • We assume you'll configure an Apache/nginx/similar reverse-proxy to access the files via a host like http://objects.example.com/.
  • It is assumed you might wish to restrict uploads to particular clients, rather than allow the world to make uploads. The simplest way of doing this is to use your firewall to filter access to port 9991.

  • The blob-servers must be reachable by the host(s) running the API-service, but they should not be publicly visible.

    • If your blob-servers are exposed to the internet remote users could use the API to spider and download all your content.
  • None of the servers need to be launched as root, because they don't bind to privileged ports, or require special access.

    • NOTE: issue #6 improved the security of the blob-server by invoking chroot(). However chroot() will fail if the server is not launched as root, which is harmless.
  • You can also read about scaling when your data is too large to fit upon a single blob-server:

Future Changes?

It would be possible to switch to using chunked storage, for example breaking up each file that is uploaded into 128Mb sections and treating them as distinct. The reason that is not done at the moment is because it relies upon state:

  • The public server needs to be able to know that the file with a given ID is comprised of the following chunks of data:
    • a5d606958533634fed7e6d5a79d6a5617252021f
    • 038deb6940db2d0e7b9ee9bba70f3501a0667989
    • a7914eb6ff984f97c5f6f365d3d93961be2e8617
    • ...
  • That data must be always kept up to date and accessible.

At the moment the API-server is stateless, so tracking that data is not possible. It possible to imagine using redis, or some other external database to record the data, but that increases the complexity of deployment.

Github Setup

This repository is configured to run tests upon every commit, and when pull-requests are created/updated. The testing is carried out via .github/run-tests.sh which is used by the github-action-tester action.

Releases are automated in a similar fashion via .github/build, and the github-action-publish-binaries action.

Questions?

Questions/Changes are most welcome; just report an issue.

Steve

More Repositories

1

sysadmin-util

Tools for Linux/Unix sysadmins.
Perl
949
star
2

bookmarks.public

A template for self-hosted bookmarks using HTML & jQuery.
JavaScript
662
star
3

tunneller

Allow internal services, running on localhost, to be accessed over the internet..
Go
474
star
4

simple.vm

Simple virtual machine which interprets bytecode.
C
459
star
5

deployr

A simple golang application to automate the deployment of software releases.
Go
334
star
6

gobasic

A BASIC interpreter written in golang.
Go
325
star
7

go.vm

A simple virtual machine - compiler & interpreter - written in golang
Go
322
star
8

simple-vpn

A simple VPN allowing mesh-like communication between nodes, over websockets
Go
284
star
9

monkey

An interpreted language written in Go
Go
272
star
10

sysbox

sysadmin/scripting utilities, distributed as a single binary
Go
218
star
11

esp8266

Collection of projects for the WeMos Mini D1
C++
165
star
12

kilua

A minimal text-editor with lua scripting.
C++
160
star
13

github-action-publish-binaries

Publish binaries when new releases are made.
Shell
137
star
14

evalfilter

A bytecode-based virtual machine to implement scripting/filtering support in your golang project.
Go
117
star
15

rss2email

Convert RSS feeds to emails
Go
112
star
16

e-comments

External comments for static HTML pages, a lightweight self-hosted disqus alternative.
JavaScript
101
star
17

cpmulator

Golang CP/M emulator for zork, Microsoft BASIC, Turbo Pascal, Wordstar, lighthouse-of-doom, etc
Go
97
star
18

lighthouse-of-doom

A simple text-based adventure game
C
95
star
19

linux-security-modules

A place to store my toy linux-security modules.
C
90
star
20

marionette

Something like puppet, for the localhost only.
Go
85
star
21

kpie

Simple devilspie-like program for window manipulation, with Lua.
C
79
star
22

foth

Tutorial-style FORTH implementation written in golang
Go
78
star
23

dhcp.io

Dynamic DNS - Via Redis, Perl, and Amazon Route53.
Perl
68
star
24

templer

A modular extensible static-site-generator written in perl.
Perl
63
star
25

overseer

A golang-based remote protocol tester for testing sites & service availability
Go
62
star
26

assembler

Basic X86-64 assembler, written in golang
Go
61
star
27

math-compiler

A simple intel/AMD64 assembly-language compiler for mathematical operations
Go
60
star
28

node-reverse-proxy.js

A reverse HTTP-proxy in node.js
JavaScript
54
star
29

webmail

A golang webmail server.
Go
52
star
30

dotfiles

Yet another dotfile-repository
Emacs Lisp
49
star
31

github2mr

Export all your github repositories to a form suitable for 'myrepos' to work with.
Go
46
star
32

puppet-summary

The Puppet Summary is a web interface providing reporting features for Puppet, it replaces the Puppet Dashboard project
Go
46
star
33

org-worklog

A template for maintaining a work-log, via org-mode.
42
star
34

rss2hook

POST to webhook(s) when new feed-items appear.
Go
37
star
35

tweaked.io

The code behind http://tweaked.io/
JavaScript
36
star
36

pam_pwnd

A PAM module to test passwords against previous leaks at haveibeenpwned.com
C
35
star
37

critical

A simple/minimal TCL interpreter, written in golang
Go
34
star
38

alphavet

A golang linter to detect functions not in alphabetical order
Go
32
star
39

dns-api-go

The code behind https://dns-api.org/
Go
31
star
40

markdownshare.com

The code which was previously used at http://markdownshare.com/
Perl
29
star
41

github-action-tester

Run tests when pull-requests are opened, or commits pushed.
Shell
26
star
42

bfcc

BrainFuck Compiler Challenge
Go
22
star
43

maildir-tools

Golang-based utility which can be used for scripting Maildir things, and also as a basic email client
Go
22
star
44

purppura

A server for receiving and processing alerts & events.
Go
21
star
45

cpm-dist

A curated collection of CP/M software
C
20
star
46

implant

Simple utility for embedding files/resources inside golang binaries
Go
20
star
47

chronicle2

Chronicle is a simple blog compiler, written in Perl with minimal dependencies.
Perl
20
star
48

z80-examples

Z80 assembly-language programs.
Makefile
19
star
49

dns-api.org

The code which was previously used at https://dns-api.org/
Perl
19
star
50

yal

Yet another lisp interpreter
Go
16
star
51

ephemeris

A static blog-compiler
Go
15
star
52

markdownshare

The code behind https://markdownshare.com/
Go
15
star
53

aws-utils

A small collection of AWS utilities, packaged as a single standalone binary.
Go
14
star
54

z80retroshield

Arduino library for driving the Z80 retro-shield.
Shell
13
star
55

predis

A redis-server written in Perl.
Perl
12
star
56

github-action-build

Build a project, creating artifacts
Shell
12
star
57

webserver-attacks

Identify attacks against webservers via simple rules
Perl
12
star
58

Device-Osram-Lightify

Interface to the Osram Lightify system
Perl
12
star
59

labeller

Script label addition/removal for gmail/gsuite email.
Go
10
star
60

da-serverspec

ServerSpec.org configuration for the Debian-Administration cluster.
Ruby
10
star
61

docker-api-gateway

Trivial API-gateway for docker, via HAProxy
Go
10
star
62

http2xmpp

HTTP to XMPP (jabber) bridge.
Perl
9
star
63

nanoexec

Trigger commands over a nanomsg queue
C
9
star
64

go-experiments

Repository containing experiments as I learn about golang
Go
9
star
65

golang-metrics

Automatic submission of system metrics to graphite, for golang applications
Go
8
star
66

pass

password-store distribution, with plugins.
Shell
8
star
67

ms-lite

A collection of plugins for a qpsmtpd-powered virtual-host aware SMTP system.
Perl
8
star
68

remotehttp

Magic wrapper to deny HTTP-requests to to "local" resources.
Go
8
star
69

dashboard

Redis & node.js powered dashboard skeleton
JavaScript
8
star
70

Buffalo-220-NAS

Installing NFS on a Buffalo 220 NAS device
Shell
8
star
71

asql

A toy utility to process Apache log files via SQL.
Perl
7
star
72

knownfs

A FUSE-based filesystem that exports ~/.ssh/known_hosts
Go
7
star
73

mpd-web

Simple HTTP view of an MPD server
Go
7
star
74

DockerFiles

Container for various dockerfiles.
Shell
6
star
75

yawns

Yet another weblog/news site
Perl
6
star
76

org-diary

Easily maintain a simple work-log / journal with the use of org-mode
Emacs Lisp
6
star
77

cidr_match.js

A simple module to test whether a given IPv4 address is within a particular CIDR range.
JavaScript
6
star
78

mod_writable

Disallow serving writable files under Apache 2.4.x
C
5
star
79

mod_blacklist

A simple Apache module to blacklist remote hosts.
C
5
star
80

arduino-mega-z80-simplest

The simplest possible project combining an Arduino Mega and a Zilog Z80 processor
C++
4
star
81

turtle

A simple turtle-implementation, using FORTH as a scripting-language
Go
4
star
82

purple

A simplified version of mauvealert
Perl
3
star
83

subcommands

Easy subcommand handling for a golang based command-line application
Go
3
star
84

runme

A quick hack for running commands from README.md files
Go
3
star
85

thyme

A simple package-building system, using docker
Perl
2
star
86

httpd

Simple golang HTTP server
Go
2
star
87

edinburgh.io

Open pub database
JavaScript
2
star
88

lexing-parsing-linting-stuffs

Code to go with my talk
Python
2
star
89

run-directory

A simple application inspired by `run-parts`.
Go
2
star
90

Redis--SQLite

Redis-Compatible module which writes to SQLite
Perl
2
star
91

devopswithdocker.com

Repository created for the Helsinki University course.
Dockerfile
2
star
92

aws-list

Export a dump of all running EC2 instances, along with AMI details, AMI age, etc, etc.
1
star
93

WebService--Amazon--Route53--Caching

Perl module to cache the results of WebService::Amazon::Route53
Perl
1
star
94

calibre-plugins

A small collection of calibre-plugins.
Python
1
star
95

org-tag-cloud

Easily maintain a tag-cloud of org-mode tags.
Emacs Lisp
1
star
96

headerfile

Parse files with simple key:value headers, easily.
Go
1
star
97

z80-cpm-scripting-interpreter

A trivial I/O language, with repl, written in z80 assembler to run under CP/M.
Makefile
1
star
98

Test--RemoteServer

The Perl module Test::RemoteServer
Perl
1
star