• Stars
    star
    145
  • Rank 245,695 (Top 5 %)
  • Language
    Go
  • License
    GNU General Publi...
  • Created almost 8 years ago
  • Updated over 4 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Simple Object Storage (I wish I could call it Steve's Simple Storage, or S3 ;)

Go Report Card license Release

Simple Object Storage

This Simple Object Storage (SOS) project is a HTTP-based object-storage system which allows files to be uploaded, and later retrieved via HTTP.

Files can be replicated across a number of hosts to ensure redundancy, and increased availability in the event of hardware failure.

Installation

There are two ways to install this project from source, which depend on the version of the go version you're using.

If you just need the binaries you can find them upon the project release page.

Source Installation go <= 1.11

If you're using go before 1.11 then the following command should fetch/update the project and install it upon your system:

 $ go get -u github.com/skx/sos

Source installation go >= 1.12

If you're using a more recent version of go (which is highly recommended), you need to clone to a directory which is not present upon your GOPATH:

git clone https://github.com/skx/sos
cd sos
go install

Overview

You can read the design overview for more details, but the core idea behind the implmentation relies upon the notion of a "blob server" - which is a very simple service which provides only the following simple primitives:

  • Store a particular chunk of binary data with a specific name.
  • Given a name retrieve the chunk of binary data associated with it.
  • Return a list of all known names.

The public API is built upon the top of that primitive, and both are launched via the same command sos, by specifying the sub-command to use:

 $ ./sos blob-server ...
 $ ./sos api-server ...

Here the first command launches a blob-server, which is the back-end for storage, and the second command launches the public API server - which is what your code/users should operate against.

If you launch sos with no arguments you'll see brief details of the available subcommands.

Quick Start

In an ideal deployment at least two hosts would be used:

  • One host would run the public-server.
    • This allows uploads to be made, and later retrieved.
  • Each of the two hosts would also run a blob-server.
    • The blob-servers provide the actual storage of the uploaded-objects.
    • The contents of these are replicated out of band.

We can simulate a deployment upon a single host for the purposes of testing. You'll just need to make sure you have four terminals open to run the appropriate daemons.

First of all you'll want to launch a pair of blob-servers:

$ sos blob-server -store data1 -port 4001
$ sos blob-server -store data2 -port 4002

NOTE: The storage-paths (./data1 and ./data2 in the example above) is where the uploaded-content will be stored. These directories will be created if missing.

In production usage you'd generally record the names of the blob-servers in a configuration file, either /etc/sos.conf, or ~/.sos.conf, however they may also be specified upon the command line.

We'll then start the public/API-server ensuring that it knows about the blob-servers to store content in:

$ sos api-server -blob-server http://localhost:4001,http://localhost:4002
Launching API-server
..

Now you, or your code, can connect to the server and start uploading/downloading objects. By default the following ports will be used by the sos-server:

service port
upload service 9991
download service 9992

Providing you've started all three daemons you can now perform a test upload with curl:

$ curl -X POST --data-binary @/etc/passwd http://localhost:9991/upload
{"id":"cd5bd649c4dc46b0bbdf8c94ee53c1198780e430","size":2306,"status":"OK"}

If all goes well you'll receive a JSON-response as shown, and you can use the ID which is returned to retrieve your object:

$ curl http://localhost:9992/fetch/cd5bd649c4dc46b0bbdf8c94ee53c1198780e430
..
$

NOTE: The download service runs on a different port. This is so that you can make policy decisions about uploads/downloads via your local firewall.

At the point you run the upload the contents will only be present on one of the blob-servers, chosen at random. To ensure your data is replicated you need to (regularly) launch the replication utility:

$ sos replicate -blob-server http://localhost:4001,http://localhost:4002 --verbose
group - server
   default - http://localhost:4001
   default - http://localhost:4002
Syncing group: default
   Group member: http://localhost:4001
   Group member: http://localhost:4002
   Object cd5bd649c4dc46b0bbdf8c94ee53c1198780e430 is missing on http://localhost:4001
     Mirroring cd5bd649c4dc46b0bbdf8c94ee53c1198780e430 from http://localhost:4002 to http://localhost:4001
        Fetching :http://localhost:4002/blob/cd5bd649c4dc46b0bbdf8c94ee53c1198780e430
        Uploading :http://localhost:4001/blob/cd5bd649c4dc46b0bbdf8c94ee53c1198780e430

Meta-Data

When uploading objects it is often useful to store meta-data, such as the original name of the uploaded object, the owner, or some similar data. For that reason any header you add to your upload with an X-prefix will be stored and returned on download.

As a special case the header X-Mime-Type can be used to set the returned Content-Type header too.

For example uploading an image might look like this:

$ curl -X POST -H "X-Orig-Filename: steve.jpg" \
               -H "X-MIME-Type: image/jpeg" \
               --data-binary @/home/skx/Images/tmp/steve.jpg \
        http://localhost:9991/upload
{"id":"20b30df22469e6d7617c7da6a457d4e384945a06","status":"OK","size":17599}

Downloading will result in the headers being set:

$ curl -v http://localhost:9992/fetch/20b30df22469e6d7617c7da6a457d4e384945a06 >/dev/null
..
< HTTP/1.1 200 OK
< X-Orig-Filename: steve.jpg
< Date: Fri, 27 May 2016 06:17:39 GMT
< Content-Type: image/jpeg
< Transfer-Encoding: chunked
<
{ [data not shown]

Production Usage

  • The API service must be visible to clients, to allow downloads to be made.

    • Because the download service runs on port 9992 it is assumed that corporate firewalls would deny access.
    • We assume you'll configure an Apache/nginx/similar reverse-proxy to access the files via a host like http://objects.example.com/.
  • It is assumed you might wish to restrict uploads to particular clients, rather than allow the world to make uploads. The simplest way of doing this is to use your firewall to filter access to port 9991.

  • The blob-servers must be reachable by the host(s) running the API-service, but they should not be publicly visible.

    • If your blob-servers are exposed to the internet remote users could use the API to spider and download all your content.
  • None of the servers need to be launched as root, because they don't bind to privileged ports, or require special access.

    • NOTE: issue #6 improved the security of the blob-server by invoking chroot(). However chroot() will fail if the server is not launched as root, which is harmless.
  • You can also read about scaling when your data is too large to fit upon a single blob-server:

Future Changes?

It would be possible to switch to using chunked storage, for example breaking up each file that is uploaded into 128Mb sections and treating them as distinct. The reason that is not done at the moment is because it relies upon state:

  • The public server needs to be able to know that the file with a given ID is comprised of the following chunks of data:
    • a5d606958533634fed7e6d5a79d6a5617252021f
    • 038deb6940db2d0e7b9ee9bba70f3501a0667989
    • a7914eb6ff984f97c5f6f365d3d93961be2e8617
    • ...
  • That data must be always kept up to date and accessible.

At the moment the API-server is stateless, so tracking that data is not possible. It possible to imagine using redis, or some other external database to record the data, but that increases the complexity of deployment.

Github Setup

This repository is configured to run tests upon every commit, and when pull-requests are created/updated. The testing is carried out via .github/run-tests.sh which is used by the github-action-tester action.

Releases are automated in a similar fashion via .github/build, and the github-action-publish-binaries action.

Questions?

Questions/Changes are most welcome; just report an issue.

Steve

More Repositories

1

sysadmin-util

Tools for Linux/Unix sysadmins.
Perl
940
star
2

bookmarks.public

A template for self-hosted bookmarks using HTML & jQuery.
JavaScript
660
star
3

tunneller

Allow internal services, running on localhost, to be accessed over the internet..
Go
457
star
4

simple.vm

Simple virtual machine which interprets bytecode.
C
452
star
5

deployr

A simple golang application to automate the deployment of software releases.
Go
323
star
6

gobasic

A BASIC interpreter written in golang.
Go
311
star
7

go.vm

A simple virtual machine - compiler & interpreter - written in golang
Go
309
star
8

simple-vpn

A simple VPN allowing mesh-like communication between nodes, over websockets
Go
276
star
9

monkey

An interpreted language written in Go
Go
248
star
10

sysbox

sysadmin/scripting utilities, distributed as a single binary
Go
205
star
11

esp8266

Collection of projects for the WeMos Mini D1
C++
162
star
12

kilua

A minimal text-editor with lua scripting.
C++
158
star
13

github-action-publish-binaries

Publish binaries when new releases are made.
Shell
134
star
14

evalfilter

A bytecode-based virtual machine to implement scripting/filtering support in your golang project.
Go
110
star
15

rss2email

Convert RSS feeds to emails
Go
104
star
16

e-comments

External comments for static HTML pages, a lightweight self-hosted disqus alternative.
JavaScript
101
star
17

linux-security-modules

A place to store my toy linux-security modules.
C
87
star
18

marionette

Something like puppet, for the localhost only.
Go
84
star
19

kpie

Simple devilspie-like program for window manipulation, with Lua.
C
77
star
20

dhcp.io

Dynamic DNS - Via Redis, Perl, and Amazon Route53.
Perl
69
star
21

foth

Tutorial-style FORTH implementation written in golang
Go
67
star
22

overseer

A golang-based remote protocol tester for testing sites & service availability
Go
62
star
23

templer

A modular extensible static-site-generator written in perl.
Perl
62
star
24

math-compiler

A simple intel/AMD64 assembly-language compiler for mathematical operations
Go
58
star
25

assembler

Basic X86-64 assembler, written in golang
Go
56
star
26

lighthouse-of-doom

A simple text-based adventure game
C
56
star
27

node-reverse-proxy.js

A reverse HTTP-proxy in node.js
JavaScript
53
star
28

webmail

A golang webmail server.
Go
51
star
29

dotfiles

Yet another dotfile-repository
Emacs Lisp
50
star
30

github2mr

Export all your github repositories to a form suitable for 'myrepos' to work with.
Go
46
star
31

puppet-summary

The Puppet Summary is a web interface providing reporting features for Puppet, it replaces the Puppet Dashboard project
Go
45
star
32

org-worklog

A template for maintaining a work-log, via org-mode.
39
star
33

tweaked.io

The code behind http://tweaked.io/
JavaScript
36
star
34

rss2hook

POST to webhook(s) when new feed-items appear.
Go
35
star
35

pam_pwnd

A PAM module to test passwords against previous leaks at haveibeenpwned.com
C
34
star
36

alphavet

A golang linter to detect functions not in alphabetical order
Go
32
star
37

dns-api-go

The code behind https://dns-api.org/
Go
31
star
38

critical

A simple/minimal TCL interpreter, written in golang
Go
31
star
39

markdownshare.com

The code which was previously used at http://markdownshare.com/
Perl
29
star
40

github-action-tester

Run tests when pull-requests are opened, or commits pushed.
Shell
26
star
41

maildir-tools

Golang-based utility which can be used for scripting Maildir things, and also as a basic email client
Go
22
star
42

chronicle2

Chronicle is a simple blog compiler, written in Perl with minimal dependencies.
Perl
20
star
43

purppura

A server for receiving and processing alerts & events.
Go
20
star
44

implant

Simple utility for embedding files/resources inside golang binaries
Go
20
star
45

dns-api.org

The code which was previously used at https://dns-api.org/
Perl
19
star
46

bfcc

BrainFuck Compiler Challenge
Go
18
star
47

ephemeris

A static blog-compiler
Go
15
star
48

markdownshare

The code behind https://markdownshare.com/
Go
15
star
49

z80-examples

Z80 assembly-language programs.
Makefile
15
star
50

yal

Yet another lisp interpreter
Go
14
star
51

aws-utils

A small collection of AWS utilities, packaged as a single standalone binary.
Go
14
star
52

z80retroshield

Arduino library for driving the Z80 retro-shield.
Shell
12
star
53

Device-Osram-Lightify

Interface to the Osram Lightify system
Perl
12
star
54

github-action-build

Build a project, creating artifacts
Shell
12
star
55

webserver-attacks

Identify attacks against webservers via simple rules
Perl
12
star
56

predis

A redis-server written in Perl.
Perl
11
star
57

da-serverspec

ServerSpec.org configuration for the Debian-Administration cluster.
Ruby
10
star
58

docker-api-gateway

Trivial API-gateway for docker, via HAProxy
Go
10
star
59

http2xmpp

HTTP to XMPP (jabber) bridge.
Perl
9
star
60

nanoexec

Trigger commands over a nanomsg queue
C
9
star
61

go-experiments

Repository containing experiments as I learn about golang
Go
9
star
62

labeller

Script label addition/removal for gmail/gsuite email.
Go
8
star
63

golang-metrics

Automatic submission of system metrics to graphite, for golang applications
Go
8
star
64

ms-lite

A collection of plugins for a qpsmtpd-powered virtual-host aware SMTP system.
Perl
8
star
65

remotehttp

Magic wrapper to deny HTTP-requests to to "local" resources.
Go
8
star
66

dashboard

Redis & node.js powered dashboard skeleton
JavaScript
8
star
67

Buffalo-220-NAS

Installing NFS on a Buffalo 220 NAS device
Shell
8
star
68

asql

A toy utility to process Apache log files via SQL.
Perl
7
star
69

DockerFiles

Container for various dockerfiles.
Shell
6
star
70

yawns

Yet another weblog/news site
Perl
6
star
71

cidr_match.js

A simple module to test whether a given IPv4 address is within a particular CIDR range.
JavaScript
6
star
72

pass

password-store distribution, with plugins.
Shell
6
star
73

knownfs

A FUSE-based filesystem that exports ~/.ssh/known_hosts
Go
6
star
74

mpd-web

Simple HTTP view of an MPD server
Go
6
star
75

mod_writable

Disallow serving writable files under Apache 2.4.x
C
5
star
76

org-diary

Easily maintain a simple work-log / journal with the use of org-mode
Emacs Lisp
5
star
77

mod_blacklist

A simple Apache module to blacklist remote hosts.
C
5
star
78

arduino-mega-z80-simplest

The simplest possible project combining an Arduino Mega and a Zilog Z80 processor
C++
4
star
79

turtle

A simple turtle-implementation, using FORTH as a scripting-language
Go
4
star
80

purple

A simplified version of mauvealert
Perl
3
star
81

subcommands

Easy subcommand handling for a golang based command-line application
Go
3
star
82

thyme

A simple package-building system, using docker
Perl
2
star
83

httpd

Simple golang HTTP server
Go
2
star
84

edinburgh.io

Open pub database
JavaScript
2
star
85

run-directory

A simple application inspired by `run-parts`.
Go
2
star
86

Redis--SQLite

Redis-Compatible module which writes to SQLite
Perl
2
star
87

runme

A quick hack for running commands from README.md files
Go
2
star
88

devopswithdocker.com

Repository created for the Helsinki University course.
Dockerfile
2
star
89

aws-list

Export a dump of all running EC2 instances, along with AMI details, AMI age, etc, etc.
1
star
90

calibre-plugins

A small collection of calibre-plugins.
Python
1
star
91

WebService--Amazon--Route53--Caching

Perl module to cache the results of WebService::Amazon::Route53
Perl
1
star
92

lexing-parsing-linting-stuffs

Code to go with my talk
Python
1
star
93

Test--RemoteServer

The Perl module Test::RemoteServer
Perl
1
star
94

org-tag-cloud

Easily maintain a tag-cloud of org-mode tags.
Emacs Lisp
1
star
95

headerfile

Parse files with simple key:value headers, easily.
Go
1
star
96

z80-cpm-scripting-interpreter

A trivial I/O language, with repl, written in z80 assembler to run under CP/M.
Makefile
1
star