• Stars
    star
    348
  • Rank 121,840 (Top 3 %)
  • Language
    Zig
  • Created about 3 years ago
  • Updated about 3 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

hop

Simple archive format designed for quickly reading some files without extracting the entire archive. Possibly will be used in Bun.

25x faster than unzip and 10x faster than tar at reading individual files (uncompressed)

Format Random access Fast extraction Fast archiving Compression Encryption Append
hop βœ… βœ… βœ… ❌ ❌ ❌
tar ❌ βœ… βœ… ❌ ❌ βœ…
zip βœ… (when small) ❌ ❌ βœ… βœ… βœ…

Features:

  • Faster at printing individual files than tar & zip (compression disabled)
  • Faster extraction than zip, comparable to tar (compression disabled)
  • Faster archiving than zip, comparable to tar (compression disabled)

Anti-features:

  • Single-threaded (but doesn't need to be)
  • I wrote it in about 3 hours and there are no tests
  • No checksums yet. Probably not a good idea to use this for untrusted data until that's fixed.
  • Ignores symlinks
  • Can't be larger than 4 GB
  • Archives are read-only and file names are not normalized across platforms

Usage

Download the binary from /releases

To create an archive:

hop ./path-to-folder

To extract an archive:

hop archive.hop

To print one file from the archive:

hop archive.hop package.json

Why?

Why can't software read many tiny files with similar performance characteristics as individual files?

  • Reading and writing lots of tiny files incurs significant syscall overhead, and (npm) packages often have lots of tiny files. Zip files are unacceptably slow to read from like a directory. tar files extract quickly, but are slow at non-sequential access.
  • Reading directory entries (ls) in large directory trees is slow

Some benchmarks

On macOS 12 with an M1X

Using tigerbeetle github repo as an example

Extracting:

image

Archiving:

image

On an Ubuntu AMD64 server

Extracting a node_modules folder

image

Why faster?

  • It stores an array of hashes for each file path and the list of files are sorted lexigraphically. This makes non-sequential access faster than tar, but can make creating new archives slower.
  • Does not store directories, only files
  • .hop files are read-only (more precisely, one could append but would have to rewrite all metadata)
  • copy_file_range
  • packed struct makes serialization & deserialization very fast because there is very little encoding/decoding step.

How does it work?

  1. File contents go at the top, file metadata goes at the bottom
  2. This is the metadata it currently stores:
package Hop;

struct StringPointer {
    uint32 off;
    uint32 len;
}

struct File {
    StringPointer name;
    uint32 name_hash;
    uint32 chmod;
    uint32 mtime;
    uint32 ctime;
    StringPointer data;
}

message Archive {
    uint32 version = 1;
    uint32 content_offset = 2;
    File[] files = 3;
    uint32[] name_hashes = 4;
    byte[] metadata = 5;
}

More Repositories

1

git-peek

git repo to local editor instantly
JavaScript
698
star
2

styleurl-extension

Share & export CSS tweaks from Chrome instantly.
JavaScript
188
star
3

bun-aws-lambda

Attempting to get bun to run on AWS Lambda
TypeScript
58
star
4

svgj

Convert svg to jsx. Its fast.
HTML
57
star
5

peek

1-click from git repo to local editor
TypeScript
53
star
6

maxrss

Print how much memory was used in a spawned process
TypeScript
52
star
7

react-native-media-clipboard

Image support for the clipboard in React Native
Objective-C++
51
star
8

decky

Zero-bundle-size decorators for TypeScript
JavaScript
39
star
9

atbuild

Use JavaScript to generate JavaScript
TypeScript
35
star
10

bun-releases-for-updater

28
star
11

esbuild-plugin-ifdef

JavaScript
25
star
12

transmission-rpc

Control the Transmission BitTorrent client from Ruby
JavaScript
24
star
13

bun-livereload

Wrap a function with bun-livereload to automatically reload any imports inside the function the next time it is called
TypeScript
24
star
14

Jantire

Jantire was a digital turn-in box for homework that's graded on completion.
Ruby
16
star
15

webthing

JavaScript
9
star
16

yeet

TypeScript
9
star
17

fastbench

Fastbench.dev benchmarks repository
JavaScript
9
star
18

bun-pr

TypeScript
9
star
19

bun-gh-issues-count

TypeScript
7
star
20

htmlbuild

Automatically configure esbuild from html
JavaScript
5
star
21

bun-examples

4
star
22

zig-javascriptcore

JavaScriptCore Zig example
Zig
4
star
23

hw-perf-counters

C
3
star
24

bun-test-pkg

JavaScript
3
star
25

FundingList.us

FundingList makes it easy to find out how much funding a startup has received.
Ruby
2
star
26

karaokeweb

JavaScript
2
star
27

bun-test-repo

TypeScript
2
star
28

napi-plus100

TypeScript
2
star
29

transvoxel-data

Transvoxel tables for JavaScript
JavaScript
2
star
30

lockitron-ruby

Lock and unlock your lock your Lockitron-powered locks from bash and Ruby
Ruby
2
star
31

every-newspaper-in-us-with-twitter-handle

JavaScript
1
star
32

Email2Face

Email2Face is a little library that lets you search Facebook profile pictures by email
Ruby
1
star
33

try_git

1
star
34

devserverless

Go
1
star
35

tiny-next-app

CSS
1
star
36

zig-stat-amd64

Zig
1
star
37

WebKit-Other

1
star
38

fastbench.dev

TypeScript
1
star
39

email2face-web

The Web API of Emai2Face
Ruby
1
star
40

zig-translate-c

TypeScript
1
star
41

get-bun-zip

JavaScript
1
star
42

zig-ld-bug

Zig
1
star
43

nommit

Nommit iOS app
Objective-C
1
star
44

smsxy

SMSXY is a microframework for receiving, replying to, and sending text messages.
Ruby
1
star