• Stars
    star
    110
  • Rank 316,770 (Top 7 %)
  • Language
    Go
  • License
    MIT License
  • Created almost 7 years ago
  • Updated almost 2 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

poor man's kafka (plus in-place mutations and search)

rochefort - poor man's kafka (with in-place modifications and inverted index search)

PUSH DATA, GET FILE OFFSET; no shenanigans

(if you can afford to lose data and do your own replication)

  • disk write speed storage service that returns offsets to stored values
  • if you are ok with losing some data (does not fsync on write)
  • supports: append, multiappend, modify, get, multiget, close, query, compact
  • clients: go, java

turns out when you are fine with losing some data, things are much faster and simpler :)


run in docker

run with docker: jackdoe/rochefort:2.5

docker run -e BIND=":8000" \
           -e ROOT="/tmp/rochefort" \
           -p 8000:8000 \
           jackdoe/rochefort:2.5

breaking change between 0.5 and 1.0

  • added 4 more bytes in the header
  • the -buckets parameter is gone, so everything is appended in one file per namespace

you can migrate your data by doing:

oldServer.scan(namespace: ns) do |offset, v|
  newServer.append(namespace:ns, data: v)
end

breaking change between 1.x and 2.0

  • moved get/multiget/append to protobuf
  • moved delete/close to protobuf

parameters

  • root: root directory, files will be created at root/namespace||default/append.raw
  • bind: address to bind to (default :8000)

dont forget to mount persisted root directory

compile from source

$ go run main.go query.go input.pb.go -bind :8000 -root /tmp
2018/02/10 12:06:21 starting http server on :8000
....

APPEND/MULTI APPEND

res, err := r.Set(&AppendInput{
	AppendPayload: []*Append{{
		Namespace: ns,
		Data:      []byte("abc"),
                AllocSize: 10, // so you can do inplace modification
                Tags:      []string{"a","b","c"} // so you can search it
	}, {
		Namespace: ns,
		Data:      []byte("zxc"),
	}},
})

you can always do inplace modifications to an object, and you can also reserve some space to add more stuff to the same offset later

the searchable tags are sanitized as all non alphanumeric characters(excluding _) [^a-zA-Z0-9_]+ are removed

inverted index

passing tags a,b,c will create postings lists in the namespace a.postings, b.postings and c.postings, later you can query only specific tags with /query

MODIFY

_, err = r.Set(&AppendInput{
	ModifyPayload: []*Modify{{
		Namespace: ns,
		Offset:    off,
		Pos:       1,
		Data:      []byte("zxcv"),
	}},
})

inplace modifies position, for example if we want to replace 'abc' with 'szze' in the blob we appended at offset 0, we modify rochefort offset 0 with 'zz' from position 1 If you pass Pos: -1 it will append to the previous end of the blob

in AppendInput you can mix modify and append commands

GET/MULTI GET

fetched, err := r.Get(&GetInput{
	GetPayload: []*Get{{
		Namespace: "example",
		Offset:    offset1,
	}, {
		Namespace: "example,
		Offset:    offset12,
	}},
})

output is GetOutput which is just array of arrays of byte, so fetched[0] is array of bytes holding the first blob and fetched[1] is the second blob

NAMESPACE

you can also pass "namespace" parameter and this will create different directories per namespace, for example

namespace: events_from_20171111 
namespace: events_from_20171112

will crete {root_directory}/events_from_20171111/... and {root_directory}/events_from_20171112/...

and then you simply delete the directories you don't need (after closing them)

CLOSE/DELETE

Closes a namespace so it can be deleted (or you can directly delete it with DELETE)

STORAGE FORMAT

header is 16 bytes
D: data length: 4 bytes
R: reserved: 8 bytes
A: allocSize: 4 bytes
C: crc32(length, time): 4 bytes
V: the stored value

DDDDRRRRRRRRAAAACCCCVVVVVVVVVVVVVVVVVVVV...DDDDRRRRRRRRAAAACCCCVVVVVV....

as you can see the value is not included in the checksum, I am checking only the header as my usecase is quite ok with missing/corrupting the data itself, but it is not ok if corrupted header makes us allocate 10gb in output := make([]byte, dataLen)

SCAN

scans the file

$ curl http://localhost:8000/scan?namespace=someStoragePrefix > dump.txt

the format is [len 4 bytes(little endian)][offset 8 bytes little endian)]data...[len][offset]data

SEARCH

you can search all tagged blobs, the dsl is fairly simple, post/get json blob to /query

  • basic tag query
{"tag":"xyz"}
  • basic OR query
{"or": [... subqueries ...]}
  • basic AND query
{"and": [... subqueries ...]}

example:

curl -XGET -d '{"and":[{"tag":"c"},{"or":[{"tag":"b"},{"tag":"c"}]}]}' 'http://localhost:8000/query'

it spits out the output in same format as /scan, so the result of the query can be very big but it is streamed

LICENSE

MIT

naming rochefort

Rochefort Trappistes 10 is my favorite beer and I was drinking it while doing the initial implementation at sunday night

losing data + NIH

You can lose data on crash and there is no replication, so you have to orchestrate that yourself doing double writes or something.

The super simple architecture allows for all kinds of hacks to do backups/replication/sharding but you have to do those yourself.

My usecase is ok with losing some data, and we dont have money to pay for kafka+zk+monitoring(kafka,zk), nor time to learn how to optimize it for our quite big write and very big multi-read load.

Keep in mind that there is some not-invented-here syndrome involved into making it, but I use the service in production and it works very nice :)

non atomic modify

there is race between reading and modification from the client prespective

TODO

  • travis-ci
  • perl client
  • make c client that can be used from ruby/perl
  • javadoc for the java client
  • publish the java client on maven central

More Repositories

1

programming-for-kids

book for parents and kids.
Python
412
star
2

baxx

ssh [email protected] [shutdown due to covid-19]
Go
137
star
3

emacs-chatgpt-jarvis

press F12 to record, use whisper to transcribe and chatgpt to answer
Python
43
star
4

butter

tiling windows for macos x
Objective-C
29
star
5

net-gemini

gemini server - gemini://gemini.circumlunar.space/ (https://gemini.circumlunar.space/)
Go
28
star
6

zr

🌩 offline and serverless stackoverflow/man/etc.. search with low memory footprint
Go
24
star
7

berserk

personal website + shell (chrooted) [ shut down due to covid ]
Go
16
star
8

juun

cross terminal history for zsh with good search (that learns with vowpal wabbit)
Go
13
star
9

scrambled-eggs

scramble and unscramble eggs
Java
11
star
10

texty-mac

lightweight text editor for MacOSX Lion
Objective-C
11
star
11

godzilla

micro-mini-nano Go web framework #golang
Go
11
star
12

roaring-query

simple query interface on top of roaring bitmaps
Go
10
star
13

weather

source of https://freeweatherapi.com (free weather api with cached data from api.met.no)
9
star
14

zearch

ragecoded code search with json endpoint
Go
9
star
15

detective

make a simple web page with your kids [shutdown]
HTML
8
star
16

back-to-back

proof of concept io queue
Go
7
star
17

updown

just some string tools
Go
7
star
18

octopus_query

octopus - query arrays of integers
Rust
6
star
19

no

well.. lets see
Go
5
star
20

judoc

almost s3 haha; digitalocean s3 sometimes has 10s latency.. soo..
Go
5
star
21

cacher

golang dns proxy + cache (for ttl interval)
Go
5
star
22

bzzz

clojure + lucene + ring
Clojure
5
star
23

turtle

pure java vowpal wabbit model reader and predictions
Java
4
star
24

inverted

naive one file per term stored inverted index
C
4
star
25

awesome-app

needed from time to time
JavaScript
4
star
26

go-gpmctl

/dev/gpmctl reader (general purpose mouse - gpm)
Go
3
star
27

slock

fork of http://git.suckless.org/slock/ using pam
C
3
star
28

paxx

simple js search thing
JavaScript
3
star
29

platform-nine-and-three-quarters

teaching my wife(Gergana) how to code
3
star
30

ffs-ungpt

use chatgpt to summarize selected text on a page... every single page will be 20 times longer than it should.. so just UNGPT it
JavaScript
2
star
31

punkjazz

gutenberg project offline search app https://expo.dev/@jackdoez/punkjazz
TypeScript
2
star
32

ascii-bot

Go
2
star
33

panda

experimenting with stored+sorted arrays of longs
Java
2
star
34

hammer

linux kernel tcp client for testing things
C
2
star
35

random

random small things, old/wip/bad
C
2
star
36

go-metno

https://api.met.no weather api location forecast go client
Go
2
star
37

elixir-vowpal-fleet

vowpal wabbit distributed supervisor using swarm and elixir (including handoff of the models)
Elixir
2
star
38

neko

learn japanese
JavaScript
1
star
39

img2ascii

go image to ascii
Go
1
star
40

go-rochefort-client

go client for rochefort
Go
1
star
41

ragna

simple blob store on top of ipfs (encrypted)
Go
1
star
42

dhcpd

dhcp server with mysql backend for static ip address assignment
C
1
star
43

crowley

download the index page of a bunch of domains
Go
1
star
44

drive

hackyourfuture - follow the line robot
JavaScript
1
star
45

zr-public

Shell
1
star
46

p5-UDT-Simple

simplified perl bindings for UDT (reliable UDP based application level data transport protocol - http://udt.sourceforge.net/)
C
1
star
47

vowpal-turtlejs

pure javascript vowpal wabbit model predictions
JavaScript
1
star
48

texty

can I please read 80ch text on phone? is it too much to ask?
Dart
1
star
49

go-evalish

just some helper functions to compile and run go code at runtime
Go
1
star
50

advent-rust

Rust
1
star
51

validations

gorm validations with Validate interface and github.com/asaskevich/govalidator support
Go
1
star
52

go-pager

pipe output through $PAGER
Go
1
star
53

sfs

fuse based mysql backed file system
C
1
star
54

jsonny

simple UITableViewController subclass, that lets you control the table from remote JSON objects
Objective-C
1
star
55

grom

playing around with finatra; exporting /learn and /query to openhft.chronicle.map + fst byte encoded, persisted naive bayesian classifier
Java
1
star
56

frankenworms

just goofing around with openworm's c-elegans connectome data
Go
1
star