• Stars
    star
    3,742
  • Rank 11,302 (Top 0.3 %)
  • Language
    C
  • License
    Other
  • Created over 6 years ago
  • Updated 26 days ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Wrangling Untrusted File Formats Safely

Wuffs is a memory-safe programming language (and a standard library written in that language) for Wrangling Untrusted File Formats Safely. Wrangling includes parsing, decoding and encoding. Example file formats include images, audio, video, fonts and compressed archives.

It is "ridiculously fast".

Screenshot of a tweet saying "ridiculously fast"

Per its benchmarks and other linked-to blog posts:

Goals and Non-Goals

Wuffs' goal is to produce software libraries that are as safe as Go or Rust, roughly speaking, but as fast as C, and that can be used anywhere C libraries are used. This includes very large C/C++ projects, such as popular web browsers and operating systems (using that term to include desktop and mobile user interfaces, not just the kernel).

Wuffs the Library is available as transpiled C code. Other C/C++ projects can use that library without requiring the Wuffs the Language toolchain. Those projects can use Wuffs the Library like using any other third party C library. It's just not hand-written C.

However, unlike hand-written C, Wuffs the Language is safe with respect to buffer overflows, integer arithmetic overflows and null pointer dereferences. A key difference between Wuffs and other memory-safe languages is that all such checks are done at compile time, not at run time. If it compiles, it is safe, with respect to those three bug classes.

The trade-off in aiming for both safety and speed is that Wuffs programs take longer for a programmer to write, as they have to explicitly annotate their programs with proofs of safety. A statement like x += 1 unsurprisingly means to increment the variable x by 1. However, in Wuffs, such a statement is a compile time error unless the compiler can also prove that x is not the maximal value of x's type (e.g. x is not 255 if x is a base.u8), as the increment would otherwise overflow. Similarly, an integer arithmetic expression like x / y is a compile time error unless the compiler can also prove that y is not zero.

Hermeticity

Wuffs is not a general purpose programming language. It is for writing libraries, not programs. Wuffs code is hermetic and can only compute (e.g. convert "compressed bytes" to "decompressed bytes"). It cannot make any syscalls (e.g. it has no ambient authority to read your files), implying that it cannot allocate or free memory (and is therefore trivially safe against things like memory leaks, use-after-frees and double-frees).

The idea isn't to write your whole program in Wuffs, only the parts that are both performance-conscious and security-conscious. For example, while technically possible, it is unlikely that a Wuffs compiler would be worth writing entirely in Wuffs.

What Does Wuffs Code Look Like?

The /std/lzw/decode_lzw.wuffs file is a good example. The Wuffs the Language document has more information on how it differs from other languages in the C family.

What Does Compile Time Checking Look Like?

For example, making this one-line edit to the LZW codec leads to a compile time error. wuffs gen fails to generate the C code, i.e. fails to compile (transpile) the Wuffs code to C code:

diff --git a/std/lzw/decode_lzw.wuffs b/std/lzw/decode_lzw.wuffs
index f878c5e..f10dcee 100644
--- a/std/lzw/decode_lzw.wuffs
+++ b/std/lzw/decode_lzw.wuffs
@@ -98,7 +98,7 @@ pub func lzw_decoder.decode?(dst ptr buf1, src ptr buf1, src_final bool)() {
                        in.dst.write?(x:s)

                        if use_save_code {
-                               this.suffixes[save_code] = c as u8
+                               this.suffixes[save_code] = (c + 1) as u8
                                this.prefixes[save_code] = prev_code as u16
                        }
$ wuffs gen std/gif
check: expression "(c + 1) as u8" bounds [1 ..= 256] is not within bounds [0 ..= 255] at
/home/n/go/src/github.com/google/wuffs/std/lzw/decode_lzw.wuffs:101. Facts:
    n_bits < 8
    c < 256
    this.stack[s] == (c as u8)
    use_save_code

In comparison, this two-line edit will compile (but the "does it decode GIF correctly" tests then fail):

diff --git a/std/lzw/decode_lzw.wuffs b/std/lzw/decode_lzw.wuffs
index f878c5e..b43443d 100644
--- a/std/lzw/decode_lzw.wuffs
+++ b/std/lzw/decode_lzw.wuffs
@@ -97,8 +97,8 @@ pub func lzw_decoder.decode?(dst ptr buf1, src ptr buf1, src_final bool)() {
                        // type checking, bounds checking and code generation for it).
                        in.dst.write?(x:s)

-                       if use_save_code {
-                               this.suffixes[save_code] = c as u8
+                       if use_save_code and (c < 200) {
+                               this.suffixes[save_code] = (c + 1) as u8
                                this.prefixes[save_code] = prev_code as u16
                        }
$ wuffs gen std/gif
gen wrote:      /home/n/go/src/github.com/google/wuffs/gen/c/gif.c
gen unchanged:  /home/n/go/src/github.com/google/wuffs/gen/h/gif.h
$ wuffs test std/gif
gen unchanged:  /home/n/go/src/github.com/google/wuffs/gen/c/gif.c
gen unchanged:  /home/n/go/src/github.com/google/wuffs/gen/h/gif.h
test:           /home/n/go/src/github.com/google/wuffs/test/c/gif
gif/basic.c     clang   PASS (8 tests run)
gif/basic.c     gcc     PASS (8 tests run)
gif/gif.c       clang   FAIL test_lzw_decode: bufs1_equal: wi: got 19311, want 19200.
contents differ at byte 3 (in hex: 0x000003):
  000000: dcdc dc00 00d9 f5f9 f6df dc5f 393a 3a3a  ..........._9:::
  000010: 3a3b 618e c8e4 e4e4 e5e4 e600 00e4 bbbb  :;a.............
  000020: eded 8f91 9191 9090 9090 9190 9192 9192  ................
  000030: 9191 9292 9191 9293 93f0 f0f0 f1f1 f2f2  ................
excerpts of got (above) versus want (below):
  000000: dcdc dcdc dcd9 f5f9 f6df dc5f 393a 3a3a  ..........._9:::
  000010: 3a3a 618e c8e4 e4e4 e5e4 e6e4 e4e4 bbbb  ::a.............
  000020: eded 8f91 9191 9090 9090 9090 9191 9191  ................
  000030: 9191 9191 9191 9193 93f0 f0f0 f1f1 f2f2  ................

gif/gif.c       gcc     FAIL test_lzw_decode: bufs1_equal: wi: got 19311, want 19200.
contents differ at byte 3 (in hex: 0x000003):
  000000: dcdc dc00 00d9 f5f9 f6df dc5f 393a 3a3a  ..........._9:::
  000010: 3a3b 618e c8e4 e4e4 e5e4 e600 00e4 bbbb  :;a.............
  000020: eded 8f91 9191 9090 9090 9190 9192 9192  ................
  000030: 9191 9292 9191 9293 93f0 f0f0 f1f1 f2f2  ................
excerpts of got (above) versus want (below):
  000000: dcdc dcdc dcd9 f5f9 f6df dc5f 393a 3a3a  ..........._9:::
  000010: 3a3a 618e c8e4 e4e4 e5e4 e6e4 e4e4 bbbb  ::a.............
  000020: eded 8f91 9191 9090 9090 9090 9191 9191  ................
  000030: 9191 9191 9191 9193 93f0 f0f0 f1f1 f2f2  ................

wuffs-test-c: some tests failed
wuffs test: some tests failed

Directory Layout

  • lang holds the Go libraries that implement Wuffs the Language: tokenizer, AST, parser, renderer, etc. The Wuffs tools are written in Go, but as mentioned above, Wuffs transpiles to C code, and Go is not necessarily involved if all you want is to use the C edition of Wuffs.
  • lib holds other Go libraries, not specific to Wuffs the Language per se.
  • internal holds internal implementation details, as per Go's internal packages convention.
  • cmd holds Wuffs the Language' command line tools, also written in Go.
  • std holds Wuffs the Library's code.
  • release holds the releases (e.g. in their C form) of Wuffs the Library.
  • test holds the regular tests for Wuffs the Library.
  • fuzz holds the fuzz tests for Wuffs the Library.
  • script holds miscellaneous utility programs.
  • doc holds documentation.
  • example holds example programs for Wuffs the Library.
  • hello-wuffs-c holds an example program for Wuffs the Language.

Building

See the BUILD instructions.

Documentation

The Note directory also contains various short articles.

Status

Version 0.3. The API and ABI aren't stabilized yet. The compiler undoubtedly has bugs. Assertion checking needs more rigor, especially around side effects and aliasing, and being sufficiently well specified to allow alternative implementations. Lots of detail needs work, but the broad brushstrokes are there.

Nonetheless, Wuffs' GIF decoder has shipped in the Google Chrome web browser since June 2021. See also the "ridiculously fast" tweet already mentioned above.

Discussion

The mailing list is at https://groups.google.com/forum/#!forum/wuffs.

Contributing

The CONTRIBUTING.md file contains instructions on how to file the Contributor License Agreement before sending any pull requests (PRs). Of course, if you're new to the project, it's usually best to discuss any proposals and reach consensus before sending your first PR.

Source code is auto-formatted.

License

Apache 2. See the LICENSE file for details.

Disclaimer

This is not an official Google product, it is just code that happens to be owned by Google.


Updated on June 2023.

More Repositories

1

material-design-icons

Material Design icons by Google (Material Symbols)
49,826
star
2

guava

Google core libraries for Java
Java
48,313
star
3

zx

A tool for writing better scripts
JavaScript
37,928
star
4

styleguide

Style guides for Google-originated open-source projects
HTML
36,606
star
5

leveldb

LevelDB is a fast key-value storage library written at Google that provides an ordered mapping from string keys to string values.
C++
33,564
star
6

material-design-lite

Material Design Components in HTML/CSS/JS
HTML
32,276
star
7

googletest

GoogleTest - Google Testing and Mocking Framework
C++
32,215
star
8

jax

Composable transformations of Python+NumPy programs: differentiate, vectorize, JIT to GPU/TPU, and more
Python
28,081
star
9

comprehensive-rust

This is the Rust course used by the Android team at Google. It provides you the material to quickly teach Rust.
Rust
26,228
star
10

python-fire

Python Fire is a library for automatically generating command line interfaces (CLIs) from absolutely any Python object.
Python
26,112
star
11

mediapipe

Cross-platform, customizable ML solutions for live and streaming media.
C++
25,513
star
12

gson

A Java serialization/deserialization library to convert Java Objects into JSON and back
Java
22,945
star
13

flatbuffers

FlatBuffers: Memory Efficient Serialization Library
C++
22,064
star
14

iosched

The Google I/O Android App
Kotlin
21,790
star
15

ExoPlayer

An extensible media player for Android
Java
21,465
star
16

eng-practices

Google's Engineering Practices documentation
19,741
star
17

web-starter-kit

Web Starter Kit - a workflow for multi-device websites
HTML
18,434
star
18

flexbox-layout

Flexbox for Android
Kotlin
18,141
star
19

fonts

Font files available from Google Fonts, and a public issue tracker for all things Google Fonts
HTML
17,632
star
20

filament

Filament is a real-time physically based rendering engine for Android, iOS, Windows, Linux, macOS, and WebGL2
C++
16,946
star
21

cadvisor

Analyzes resource usage and performance characteristics of running containers.
Go
16,335
star
22

libphonenumber

Google's common Java, C++ and JavaScript library for parsing, formatting, and validating international phone numbers.
C++
15,728
star
23

gvisor

Application Kernel for Containers
Go
15,105
star
24

WebFundamentals

Former git repo for WebFundamentals on developers.google.com
JavaScript
13,842
star
25

yapf

A formatter for Python files
Python
13,648
star
26

tink

Tink is a multi-language, cross-platform, open source library that provides cryptographic APIs that are secure, easy to use correctly, and hard(er) to misuse.
Java
13,318
star
27

deepdream

13,212
star
28

brotli

Brotli compression format
TypeScript
13,154
star
29

guetzli

Perceptual JPEG encoder
C++
12,884
star
30

wire

Compile-time Dependency Injection for Go
Go
12,363
star
31

guice

Guice (pronounced 'juice') is a lightweight dependency injection framework for Java 11 and above, brought to you by Google.
Java
12,342
star
32

blockly

The web-based visual programming editor.
TypeScript
12,136
star
33

sanitizers

AddressSanitizer, ThreadSanitizer, MemorySanitizer
C
10,754
star
34

grumpy

Grumpy is a Python to Go source code transcompiler and runtime.
Go
10,464
star
35

or-tools

Google's Operations Research tools:
C++
10,405
star
36

dopamine

Dopamine is a research framework for fast prototyping of reinforcement learning algorithms.
Jupyter Notebook
10,367
star
37

auto

A collection of source code generators for Java.
Java
10,234
star
38

go-github

Go library for accessing the GitHub v3 API
Go
10,083
star
39

oss-fuzz

OSS-Fuzz - continuous fuzzing for open source software.
Shell
9,859
star
40

go-cloud

The Go Cloud Development Kit (Go CDK): A library and tools for open cloud development in Go.
Go
9,389
star
41

sentencepiece

Unsupervised text tokenizer for Neural Network-based text generation.
C++
8,657
star
42

re2

RE2 is a fast, safe, thread-friendly alternative to backtracking regular expression engines like those used in PCRE, Perl, and Python. It is a C++ library.
C++
8,190
star
43

traceur-compiler

Traceur is a JavaScript.next-to-JavaScript-of-today compiler
JavaScript
8,182
star
44

tsunami-security-scanner

Tsunami is a general purpose network security scanner with an extensible plugin system for detecting high severity vulnerabilities with high confidence.
Java
8,129
star
45

trax

Trax — Deep Learning with Clear Code and Speed
Python
7,943
star
46

skia

Skia is a complete 2D graphic library for drawing Text, Geometries, and Images.
C++
7,874
star
47

benchmark

A microbenchmark support library
C++
7,812
star
48

android-classyshark

Android and Java bytecode viewer
Java
7,468
star
49

pprof

pprof is a tool for visualization and analysis of profiling data
Go
7,408
star
50

closure-compiler

A JavaScript checker and optimizer.
Java
7,253
star
51

accompanist

A collection of extension libraries for Jetpack Compose
Kotlin
7,229
star
52

agera

Reactive Programming for Android
Java
7,227
star
53

magika

Detect file content types with deep learning
Python
7,171
star
54

diff-match-patch

Diff Match Patch is a high-performance library in multiple languages that manipulates plain text.
Python
7,132
star
55

flutter-desktop-embedding

Experimental plugins for Flutter for Desktop
C++
7,109
star
56

latexify_py

A library to generate LaTeX expression from Python code.
Python
6,953
star
57

lovefield

Lovefield is a relational database for web apps. Written in JavaScript, works cross-browser. Provides SQL-like APIs that are fast, safe, and easy to use.
JavaScript
6,847
star
58

glog

C++ implementation of the Google logging module
C++
6,797
star
59

jsonnet

Jsonnet - The data templating language
Jsonnet
6,742
star
60

error-prone

Catch common Java mistakes as compile-time errors
Java
6,690
star
61

model-viewer

Easily display interactive 3D models on the web and in AR!
TypeScript
6,473
star
62

gops

A tool to list and diagnose Go processes currently running on your system
Go
6,375
star
63

draco

Draco is a library for compressing and decompressing 3D geometric meshes and point clouds. It is intended to improve the storage and transmission of 3D graphics.
C++
6,188
star
64

automl

Google Brain AutoML
Jupyter Notebook
6,157
star
65

gopacket

Provides packet processing capabilities for Go
Go
6,082
star
66

physical-web

The Physical Web: walk up and use anything
Java
6,017
star
67

grafika

Grafika test app
Java
6,002
star
68

j2objc

A Java to iOS Objective-C translation tool and runtime.
Java
5,976
star
69

snappy

A fast compressor/decompressor
C++
5,940
star
70

osv-scanner

Vulnerability scanner written in Go which uses the data provided by https://osv.dev
Go
5,862
star
71

ios-webkit-debug-proxy

A DevTools proxy (Chrome Remote Debugging Protocol) for iOS devices (Safari Remote Web Inspector).
C
5,848
star
72

seesaw

Seesaw v2 is a Linux Virtual Server (LVS) based load balancing platform.
Go
5,599
star
73

EarlGrey

🍵 iOS UI Automation Test Framework
Objective-C
5,580
star
74

seq2seq

A general-purpose encoder-decoder framework for Tensorflow
Python
5,577
star
75

flax

Flax is a neural network library for JAX that is designed for flexibility.
Python
5,544
star
76

gemma.cpp

lightweight, standalone C++ inference engine for Google's Gemma models.
C++
5,518
star
77

google-java-format

Reformats Java source code to comply with Google Java Style.
Java
5,366
star
78

wireit

Wireit upgrades your npm/pnpm/yarn scripts to make them smarter and more efficient.
TypeScript
5,280
star
79

battery-historian

Battery Historian is a tool to analyze battery consumers using Android "bugreport" files.
Go
5,249
star
80

clusterfuzz

Scalable fuzzing infrastructure.
Python
5,202
star
81

bbr

5,156
star
82

gumbo-parser

An HTML5 parsing library in pure C99
HTML
5,141
star
83

syzkaller

syzkaller is an unsupervised coverage-guided kernel fuzzer
Go
5,136
star
84

git-appraise

Distributed code review system for Git repos
Go
5,090
star
85

google-authenticator

Open source version of Google Authenticator (except the Android app)
Java
5,077
star
86

uuid

Go package for UUIDs based on RFC 4122 and DCE 1.1: Authentication and Security Services.
Go
4,994
star
87

gts

☂️ TypeScript style guide, formatter, and linter.
TypeScript
4,930
star
88

gemma_pytorch

The official PyTorch implementation of Google's Gemma models
Python
4,920
star
89

closure-library

Google's common JavaScript library
JavaScript
4,837
star
90

cameraview

[DEPRECATED] Easily integrate Camera features into your Android app
Java
4,734
star
91

grr

GRR Rapid Response: remote live forensics for incident response
Python
4,641
star
92

liquidfun

2D physics engine for games
C++
4,559
star
93

pytype

A static type analyzer for Python code
Python
4,528
star
94

gxui

An experimental Go cross platform UI library.
Go
4,450
star
95

bloaty

Bloaty: a size profiler for binaries
C++
4,386
star
96

clasp

🔗 Command Line Apps Script Projects
TypeScript
4,336
star
97

ko

Build and deploy Go applications on Kubernetes
Go
4,329
star
98

santa

A binary authorization and monitoring system for macOS
Objective-C
4,318
star
99

google-ctf

Google CTF
Go
4,246
star
100

tamperchrome

Tamper Dev is an extension that allows you to intercept and edit HTTP/HTTPS requests and responses as they happen without the need of a proxy. Works across all operating systems (including Chrome OS).
TypeScript
4,148
star