• Stars
    star
    405
  • Rank 106,656 (Top 3 %)
  • Language
    Go
  • License
    Apache License 2.0
  • Created over 3 years ago
  • Updated about 2 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

A production-ready remote container image format (overlaybd) and snapshotter based on block-device.

Accelerated Container Image

Accelerated Container Image is an open-source implementation of paper "DADI: Block-Level Image Service for Agile and Elastic Application Deployment. USENIX ATC'20".

DADI (Data Accelerator for Disaggregated Infrastructure) is a solution for container acceleration including remote image and other features which has been widely used in Alibaba and Alibaba Cloud. By now, it has been already integrated by Alibaba Cloud Registry (ACR), and Alibaba serverless services (FC FaaSNet. USENIX ATC'21 / SAE / ECI, etc) which enter the Forrester leader quadrant.

At the heart of the acceleration is overlaybd, which is a new remote image format based on block device. Overlaybd backstore provides a merged view of a sequence of block-based layers in userspace and outputs as a virtual blocks device through TCMU. It can be used for container acceleration by supporting fetching image data on-demand without downloading and unpacking the whole image before a container running. With overlaybd image format, we can cold start a container instantly.

The key features are:

  • High Performance

    It's a block-device-based storage of OCI image, which has much lower complexity than filesystem-based implementations. For example, cross-layer hardlink and non-copy commands like chown are very complex for filesystem-based image without copying up, but is natively supported by overlaybd. Overlaybd outperforms filesystem-based solutions in performance. Evaluation data is stated in DADI paper.

  • High Reliability

    Overlaybd outputs virtual block devices through TCMU, which is widely used and supported in most operation systems. Overlaybd backstore can recover from failures or crashes, which is difficult for FUSE-based image formats.

  • Native Support for Writable

    Overlaybd can be used as writable/container layer. It can be used as container layer for runtime instead of overlayfs upper layer, or used to build overlaybd images.

  • Multiple File System Supported

    Overlaybd outputs virtual block devices, which is supported to be formatted by multiple file system. It's convenient for user to choose ideal file system.

Accelerated Container Image is a non-core sub-project of containerd.

Components

  • overlaybd

    Overlaybd provides a merged view of block-based layer sequence as an virtual block device in user space.

  • overlaybd-snapshotter

    It is a containerd snapshotter plugin for overlaybd image. This snapshotter is compatible for OCI image, as well as overlayfs snapshotter.

  • embedded image-convertor

    We provide a modified CLI tool(ctr) to facilitate image pull, and custom conversion from traditional OCI tarball format to overlaybd format.

    The convertor supports layer deduplication, which prevents duplication of layer conversion for every image conversion.

  • standalone userspace image-convertor (Experimental)

    Standalone userspace image-convertor has similar functionality to embedded image-convertor but runs in the userspace. It does not require root privilege and dependence on tcmu, configfs, snapshotter, or even on containerd. which makes it much more convenient to run in a container.

    What's more, standalone userspace image-convertor is faster than embedded image-convertor when used with our customized libext2fs. See USERSPACE_CONVERTOR for more details.

  • buildkit for overlaybd (Experimental)

    It is a customized buildkit for overlaybd images. It fetches the data of base images on demand without pulling whole data and uses overlaybd writable layer to build new layers.

  • fastoci

    It is an overlaybd-based remote image format which enables the original OCI image to be a remote one without conversion. It is similar to SOCI, but provides block device interface, which has advantages than FUSE-based formats in performance and stability.

Getting Started

  • QUICKSTART helps quickly run an overlaybd image including basic usage.

  • See how to setup overlaybd backstore at README.

  • See how to build snaphshotter and ctr plugin components at BUILDING.

  • After build or install, see our EXAMPLES about how to run an accelerated container. see EXAMPLES_CRI if you run containers by k8s/cri.

  • See the PERFORMANCE test about the acceleration.

  • See how to convert OCI image into overlaybd with specified file system at MULTI_FS_SUPPORT.

  • See how to use layer deduplication for image conversion at IMAGE_CONVERTOR.

  • See how to use overlaybd writable layer at WRITABLE.

  • See how to use Prometheus to monitor metrics like latency/error count of snapshotter GRPC APIs at PROMETHEUS.

  • See how to use FastOCI at FASTOCI.

  • Welcome to contribute! CONTRIBUTING

Release Version Support

There will be an annotation containerd.io/snapshot/overlaybd/version in the manifest of the converted image to specify the format version, following is the overlaybd release version required by them.

  • 0.1.0: for now, all release versions of overlaybd support this.

  • 0.1.0-fastoci: overlaybd >= v0.6.10

Overview

With OCI image spec, an image layer blob is saved as a tarball on the registry, describing the changeset based on it's previous layer. However, tarball is not designed to be seekable and random access is not supported. Complete downloading of all blobs is always necessary before bringing up a container.

An overlaybd blob is a collection of modified data blocks under the filesystem and corresponding to the files added, modified or deleted by the layer. The overlaybd backstore is used to provide the merged view of layers and provides a virtual block device. Filesystem is mounted on top of the device and an overlaybd blob can be accessed randomly and supports on-demond reading natively.

image data flow

The raw data of block differences, together with an index to the raw data, constitute the overlaybd blob. When attaching and mounting an overlaybd device, only indexes of each layer are loaded from remote, and stored in memory. For data reading, overlaybd performs a range lookup in the index to find out where in the blob to read and then performs a remote fetching. That blob is in Zfile format.

Zfile is a new compression file format to support seekable decompression, which can reduce storage and transmission costs. And also the checksum information to protect against data corruptions for on-demand reading is stored in Zfile. In order to be compatible with existing registries and container engines, Zfile is wrapped by a tar file, which has only one Zfile inside.

io-path

Overlaybd connects with applications through a filesystem mounted on an virtual block device. Overlaybd is agnostic to the choice of filesystem so users can select one that best fits their needs. I/O requests go from applications to a regular filesystem such as ext4. From there they go to the loopback device (through TCM_loopback) and then to the user space overlaybd backstore (through TCMU). Backend read operations are always on layer files. Some of the layer files may have already been downloaded, so these reads would hit local filesystem. Other reads will be directed to registry, or hit the registry cache. Write and trim operations are handled by overlaybd backstore which writes the data and index files of the writable layer to the local file system. For more details, see the paper.

Communication

For async communication and long running discussions please use issues and pull requests on the github repo. This will be the best place to discuss design and implementation.

For sync communication catch us in the #overlaybd slack channels on Cloud Native Computing Foundation's (CNCF) slack - cloud-native.slack.com. Everyone is welcome to join and chat. Get Invite to CNCF slack.

Licenses

Accelerated Container Image is released under the Apache License, Version 2.0.

More Repositories

1

containerd

An open and reliable container runtime
Go
16,882
star
2

nerdctl

contaiNERD CTL - Docker-compatible CLI for containerd, with support for Compose, Rootless, eStargz, OCIcrypt, IPFS, ...
Go
7,909
star
3

cgroups

cgroups package for Go
Go
1,085
star
4

runwasi

Facilitates running Wasm / WASI workloads managed by containerd
Rust
1,047
star
5

cri

Moved to https://github.com/containerd/containerd/tree/master/pkg/cri . If you wish to submit issues/PRs, please submit to https://github.com/containerd/containerd
Go
905
star
6

stargz-snapshotter

Fast container image distribution plugin with lazy pulling
Go
884
star
7

ttrpc

GRPC for low-memory environments
Go
548
star
8

imgcrypt

OCI Image Encryption Package
Go
332
star
9

overlaybd

Overlaybd: a block based remote image format. The storage backend of containerd/accelerated-container-image.
C++
257
star
10

ttrpc-rust

Rust implementation of ttrpc (GRPC for low-memory environments)
Rust
196
star
11

console

console package for Go
Go
178
star
12

rust-extensions

Rust crates to extend containerd
Rust
170
star
13

nydus-snapshotter

A containerd snapshotter with data deduplication and lazy loading in P2P fashion
Go
165
star
14

go-runc

runc bindings for Go
Go
163
star
15

go-cni

A generic CNI library to provide APIs for CNI plugin interactions
Go
146
star
16

continuity

A transport-agnostic, filesystem metadata manifest system
Go
142
star
17

nri

Node Resource Interface
Go
128
star
18

protobuild

Build protobufs in Go, easily
Go
122
star
19

fifo

fifo pkg for Go
Go
86
star
20

project

Cross-project utilities, scripts, etc.
Shell
72
star
21

btrfs

Btrfs bindings for Go
Go
68
star
22

zfs

ZFS snapshotter plugin for containerd
Go
64
star
23

typeurl

Go package for managing marshaled types to protobuf.Any
Go
49
star
24

containerd.io

Website repo for https://containerd.io
JavaScript
37
star
25

fuse-overlayfs-snapshotter

fuse-overlayfs plugin for rootless containerd
Go
37
star
26

release-tool

A release tool for generating detailed release notes
Go
34
star
27

aufs

AUFS Snapshotter for containerd
Go
26
star
28

ltag

Prepends project files with given template.
Go
16
star
29

project-checks

This cross-project repository holds utilities, scripts, and common files used across the containerd master project and many sub-projects within the containerd organization
Shell
5
star