• Stars
    star
    123
  • Rank 290,145 (Top 6 %)
  • Language
    C
  • License
    Other
  • Created about 12 years ago
  • Updated about 7 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

A miniature, portable version of bsdiff.

minibsdiff: a miniature, portable version of bsdiff Build Status

Colin Percival's bsdiff is a popular tool for creating and applying patches to binary software. This is a stripped down copy of bsdiff that's designed to be portable and reusable as a library in your own software (if you wanted to say, create your own update system.) Many people end up reusing bsdiff (it's stable, well-known, works great, and has a good license,) but I haven't found a standalone copy of the library somewhere that I could easily reuse, so I made it.

This code is based on bsdiff v4.3.

The main differences:

  • Control and data blocks in the patch output are not bzip2 compressed. You'll have to apply your own compression method. This is a very important part of bsdiff's design, and if you don't apply a compression method at some level when using this library, it won't buy you anything. Please see the 'Usage' section below.

  • Patches produced by this library are incompatible with those produced by the classic bsdiff tool. (The header format has changed appropriately so they are both incompatible with each other.) You're encouraged to change this when using the library yourself - see 'Usage' below.

  • The code has been refactored into a reusable API (documented below) consisting of a few simple functions in bsdiff.h and bspatch.h. It should be easily usable from any programming language. It has zero external dependencies.

  • It works everywhere (even under MSVC.)

  • There's a simple example included that should show you how to get started.

  • Because there are no external dependencies and it's so small, minibsdiff is great place to start if you need to customize bsdiff yourself! You'll inevitably want to do this as time goes on, and most of the work is done for you.

Usage

Building

Copy bsdiff.{c,h}, bspatch.{c,h}, minibsdiff-config.h and {stdbool,stdint}-msvc.h in your source tree and you're ready to go. You shouldn't need any special build settings for it to Just Work(TM).

API

#include "bsdiff.h"
#include "bspatch.h"

/*-
 * Determine the maximum size of a patch between two files. This function
 * should be used to allocate a buffer big enough for `bsdiff` to store
 * its output in.
 */
off_t bsdiff_patchsize_max(off_t oldsize, off_t newsize);

/*-
 * Create a binary patch from the buffers pointed to by oldp and newp (with
 * respective sizes,) and store the result in the buffer pointed to by 'patch'.
 *
 * The input pointer 'patch' must not be NULL, and the size of the buffer must
 * be at least 'bsdiff_patchsize_max(new,old)' in length.
 *
 * Returns -1 if `patch` is NULL, the 'patch' buffer is not large enough, or if
 * memory cannot be allocated.
 * Otherwise, the return value is the size of the patch that was put in the
 * 'patch' buffer.
 *
 * This function is memory-intensive, and requires max(17*n,9*n+m)+O(1) bytes
 * of memory, where n is the size of the new file and m is the size of the old
 * file. It runs in O((n+m) log n) time.
 */
int bsdiff(u_char* oldp, off_t oldsize,
           u_char* newp, off_t newsize,
           u_char* patch, off_t patchsize);

/*-
 * Determine if the buffer pointed to by `patch` of a given `size` is
 * a valid patch.
 */
bool bspatch_valid_header(u_char* patch, ssize_t patchsz);

/*-
 * Determine the size of the new file that will result from applying
 * a patch. Returns -1 if the patch header is invalid, otherwise returns
 * the size of the new file.
 */
ssize_t bspatch_newsize(u_char* patch, ssize_t patchsize);

/*-
 * Apply a patch stored in 'patch' to 'oldp', result in 'newp', and store the
 * result in 'newp'.
 *
 * The input pointers must not be NULL.
 *
 * The size of 'newp', represented by 'newsz', must be at least
 * 'bspatch_newsize(oldsz,patchsz)' bytes in length.
 *
 * Returns -1 if memory can't be allocated, or the input pointers are NULL.
 * Returns -2 if the patch header is invalid. Returns -3 if the patch itself is
 * corrupt.
 * Otherwise, returns 0.
 *
 * This function requires n+m+O(1) bytes of memory, where n is the size of the
 * old file and m is the size of the new file. It does no allocations.
 * It runs in O(n+m) time.
 */
int bspatch(u_char* oldp,  ssize_t oldsz,
            u_char* patch, ssize_t patchsz,
            u_char* newp,  ssize_t newsz);

Building the example program.

For an full example of using the API, see minibsdiff.c, which roughly reimplements the standard bsdiff/bspatch in a single tool (without compression.) To build it:

  • Running make on Linux or OS X. If you have MinGW installed and on your PATH then you can do make MinGW=YES which will build an .exe on Windows.

  • There is a CMakeLists.txt file you can use to generate Ninja, MSVC or MinGW makefile projects for Windows as well. You can of course use cmake on Linux/OS X as well.

Customization notes.

You can change the patch file's magic number by modifying BSDIFF_CONFIG_MAGIC in minibsdiff-config.h. It must be 8 bytes long (anything beyond that will be ignored.) This library by default has the magic number MBSDIF43.


You should really, really, really compress the output in some way. Whether or not you do that directly in the diff/patch routines or on the result you get from calling them is irrelevant. If you don't do this, bsdiff will buy you nothing.

Briefly, bsdiff is based on the concept of finding approximate matches between two executable files, and calculating and storing their bytewise differences in the patch file. The patch format is roughly composed of a control block specifying how to add and insert changes from the new executable into the old one, and a difference block actually composed of the differences.

Binary updates to software packages tend to have disproportionate amounts of binary-level differences from just a few source code changes. The key observation however is that most code is still the same, but relocated in such a way that things like internal pointers are always offset in a predictable manner. For example, if you have a single translation unit with 5 functions, and you fix a small bug in this code and ship it to users, the symbolic representation has not changed all that much, but the change will result in executable differences affecting all 5 functions, such that e.g. relative pointers must all be adjusted properly, across all of them.

But even then, many of these 'relocations' will be small (a byte or two,) and more than that, they will often be very regular, meaning the differences are highly redundant, and thus compressible.

As a result, an uncompressed patch from bsdiff is roughly on par with the new file in size, but compression can reduce it's size dramatically due to repeated data in the differences (by a factor of 10x or 20x.) In fact, without some sort of compression, it practically defeats the purpose of using it in the first place!

Not having compression by default is still a feature, though - it keeps the library simple and portable, and you can layer it in however you want because the source is small and easy to hack. But realistically, you'll always want to compress it at one point or another in the Real World.

Here are some good compression libraries you might be interested in:

In my non-scientific experiments, bzip at compression level 9 gives the best output size out of all the ones listed above. It's obviously worth sacrificing compression time/speed for smaller updates that decompress quickly.

Join in

File bugs in the GitHub issue tracker.

Master git repository:

  • git clone https://github.com/thoughtpolice/minibsdiff.git

There's also a BitBucket mirror:

  • git clone https://bitbucket.org/thoughtpolice/minibsdiff.git

If you're going to submit a pull request or send a patch, sign off on your changes by using git commit -s. I manage the Signed-off-by field like git: by signing off, you acknowledge that the code you are submitting for inclusion abides by the license of the project. An Acked-by field states that someone has reviewed this code, and at the very least it is not completely insane.

Authors

See AUTHORS.txt.

License

2-clause BSD. See LICENSE.txt for terms of copyright and redistribution.

More Repositories

1

heroku-ping

A utility to ping heroku web applications and keep them alive.
Ruby
82
star
2

eris

Serve your /nix/store directory over the internet ✨
Perl
82
star
3

enable_arm_pmu

Enable user-mode access to ARMv7/Linux performance counters
C
76
star
4

buck2-nix

Do not taunt happy fun ball
Starlark
59
star
5

clash-playground

A Clash playground/starter kit, using Nix
Haskell
33
star
6

nixos-zerotier-dns

My ZeroTier DNS modules
Nix
29
star
7

strict-ghc-plugin

A plugin for GHC to turn Haskell into a strict language
Haskell
27
star
8

salt

Fast cryptographic networking for Haskell
Assembly
24
star
9

rv32-sail

32-bit RISC-V Emulator
C
23
star
10

dynasm-example

example of the DynASM library
C
22
star
11

asc

Austin's supercompiler work
Haskell
21
star
12

hs-ed25519

Minimal ed25519 Haskell package, binding to the ref10 SUPERCOP implementation.
C
21
star
13

libfault

Simple C library for basic crash reporting functionality, based on Phusion Passenger.
C
21
star
14

hs-nacl

Modern Haskell Cryptography
C
18
star
15

cryptol-mode

A Cryptol major mode for Emacs.
Emacs Lisp
16
star
16

yosys-bluespec

Yosys plugin for synthesis of Bluespec code
C++
14
star
17

claap

"An Altruistic Processor", implemented in CLaSH (WARNING: incomplete code)
Haskell
13
star
18

nacl-jit

Example of using JIT'd code inside Native Client
C
8
star
19

hs-asai

Delimited, answer-type-polymorphic continuations for Haskell.
Haskell
7
star
20

coq-skeleton

A simple skeleton for Coq projects
Coq
7
star
21

sqlite4_lsm_lz4

SQLite4 Log-structured merge-tree (LSM) compression demo.
C
7
star
22

cse-ghc-plugin

common subexpression elimination plugin for GHC
Haskell
6
star
23

hs-cityhash

Haskell binding to google cityhash
C++
6
star
24

hs-noise

Usable security for the internet, Deux (in Haskell)
Haskell
5
star
25

bootstrap.nix

code to bootstrap my nix projects
Nix
5
star
26

bucktools

Tools for the Buck2 build system
Starlark
5
star
27

dev-randomdotorg

Linux kernel driver interface to random.org, providing an atmosphere-powered random block device
C
5
star
28

haskell-donate

👏 give 👏 us 👏 money
Haskell
4
star
29

narinfo-tools

Simple collection of narinfo tools
Rust
4
star
30

shake-extras

Extras for the Shake build system
Haskell
4
star
31

foundationdb-k8s

Experiments with FoundationDB Simulation on Kubernetes
Shell
4
star
32

clash-docs

(EXPERIMENTAL) Sphinx documentation for Clash
Python
4
star
33

heroku-stripe-donate

Simple Ruby server for Stripe Donations, designed for Heroku.
Ruby
4
star
34

libphutil-yubikey

Yubikey OTP authentication for Phabricator
PHP
4
star
35

homebrew-personal

Personal homebrew tap
Ruby
3
star
36

embedding-server

Stateless HTTP service for computing sentence embeddings
Python
3
star
37

regalloc-bkp

Attempt at a modular register allocator interface
Haskell
3
star
38

bcachefs-tools

http://bcachefs.org
C
3
star
39

bf-pypy

A brainfuck JIT using RPython
Python
2
star
40

pcap-conduit

Conduit <-> libpcap interface
Haskell
2
star
41

hs-curve25519

Minimal curve25519 Haskell package.
C
2
star
42

hs-poly1305

Minimal poly1305 Haskell package.
Haskell
2
star
43

hs-vix

Haskell bindings to VMware's VIX API.
Haskell
2
star
44

dockerfiles

My personal collection of Dockerfiles
Shell
2
star
45

nix-makeself

Create binary packages from Nix closures
Perl
1
star
46

AHA

Slides for my monthly AHA talks
1
star
47

hs-hmac-sha512256

Minimal HMAC-SHA-512-56 Haskell package.
C
1
star
48

ghc-prim-compat

Compatibility shim for GHC primops
Haskell
1
star
49

bors-tests

Nix
1
star
50

tree-sitter-openddl

a tree-sitter grammar, for OpenDDL v2.0
JavaScript
1
star
51

hs-microtimer

A tiny Haskell library for benchmarking IO actions.
Haskell
1
star
52

fdblog2clickhouse

Ingest FoundationDB Logs into ClickHouse
Python
1
star
53

nix-mirror-old

Mirroring system for upstream nixos.org binary caches
JavaScript
1
star
54

network-data-conduit

A conduit for parsing data into TCP/IP/UDP headers and frames
Haskell
1
star
55

hs-siphash2448

Minimal siphash24-MAC/siphash28-MAC Haskell package.
Haskell
1
star