• Stars
    star
    299
  • Rank 135,931 (Top 3 %)
  • Language
    C
  • License
    Apache License 2.0
  • Created over 7 years ago
  • Updated almost 4 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

LCFS Graph driver for Docker

LCFS Storage Driver for Docker

Build Status Go Report Card Docker Pulls

tl;dr: Every time you build, pull or destroy a Docker container, you are using a storage driver. Current storage drivers like Device Mapper, AUFS, and Overlay2 implement container behavior using file systems designed to run a full OS. We are open-sourcing a file system that is purpose-built for the container lifecycle. We call this new file system Layer Cloning File System (LCFS). Because it is designed only for containers, it is up to 2.5x faster to build an image and up to almost 2x faster to pull an image. We're looking forward to working with the container community to improve and expand this new tool.

Overview

Layer Cloning FileSystem (LCFS) is a new filesystem purpose-built to be a Docker storage driver. All Docker images are constructed of layers using storage drivers (graph drivers) like AUFS, OverlayFS, and Device Mapper. As a design principle, LCFS focuses on layers as the first-class citizen. The LCFS filesystem operates directly on top of block devices, as opposed to merging separate filesystems. Thereby, LCFS aims to directly manage at the container image’s layer level, eliminate the overhead of having a second filesystem that then is merged, and to optimize for density.

LCFS will also support the snapshot driver interface being defined by containerd.

The future direction is to enhance LCFS with cluster-level operations, provide richer container statistics, and pave the way towards content integrity in container images.

  • cluster operations: where image pulls can be cooperatively satisfied by images across a group of servers instead of being isolated to a single server, as it is today.
  • statistics: answering what are the most popular layers, and so on.
  • content-integrity: ensuring container content has not been altered. See OCI scope.

LCFS filesystem is an open source project, and we welcome feedback and collaboration. The driver is currently experimental.

Design Principles

Today, running containers on the same server is often limited by side effects that come from mapping container behavior over general filesystems. The approach impacts the entire lifecycle: building, launching, reading data, and exiting containers.

Historically, filesystems were built with the expectation that content is read/writeable. However, Docker images are constructed using many read-only layers and a single read-write layer. As more containers are launched using the same image (like fedora/apache), reading a file within a container requires traversing (up to) all of the other containers running that image.

The design principles are:

  • layers are managed directly: inherently understand layers, their different states, and be able to directly track and manage layers.
  • clone independence: create and run container images as independent entities, from an underlying filesystem perspective. Each new instantiation of the same Docker image is an independent clone, at the read-only layer.
  • containers in clusters: optimize for clustered operations (cooperative 'pull'), optimize for common data patterns (coalesce writes, data that is ephemeral), and avoid inheriting behavior that overlaps (when to use the graph-database).

Performance Goals and Architecture

An internal filesystem level measure of success is to make the creation and management independent of the size of the image and the number of layers. An external measure of success is to make the launch and termination of one hundred containers a constant time operation.

Additional performance considerations:

  • page cache: non-filesystem storage drivers create multiple copies of the same image layers in memory. This leaves less host memory for containers. We should not.
  • inodes: some union filesystems create multiple inodes per file, leading to inode exhaustion in build scenarios and at scale. We should not.
  • space management: a lot can be done to improve garbage collection and space management, automatically removing orphaned layers. We should do this.

Measured Performance

The current experimental release of LCFS is shown against several of the top storage drivers. These tests were run against a local repository to remove network variablity.

The below table shows a quick comparison of how long it takes LCFS to complete some common Docker operations compared to other storage drivers, using an Ubuntu 14.04 system with a single SATA disk. Times are measured in seconds, and the number in () shows the % decrease in time with respect to the comparison driver.

test LCFS AUFS DEVICE MAPPER Overlay Overlay2
docker pull gourao/fio 8.831s 10.413s (18%) 13.520s (53%) 11.301s (28%) 10.523s (19%)
docker pull mysql 13.359s 16.438s (23%) 24.998s (87%) 19.170s (43%) 16.252s (22%)
docker build 221.539s 572.677s (159%) 561.403s (153%) 549.851s (148%) 551.893s (149%)

Create / Destroy: The diagram below depicts the time to create and destroy 20, 40, 60, 80 and 100 fedora/apache containers. The image was pulled before the test. The cumulative time measured: LCFS at 44 seconds, Overlay at 237 sec, Overlay2 at 246 sec, AUFS 285 sec, Btrfs at 487 sec, and Device Mapper at 556 sec. alt text

Build: The diagram below depicts the time to build docker sources using various storage drivers. The individual times measured: Device Mapper at 1512 sec, Btrfs at 956 seconds, AUFS at 574 seconds, Overlay at 914 seconds, Overlay2 at 567 seconds, and LCFS at 437 seconds. alt text

Architecture

The LCFS filesystem is user-level, written in C, POSIX-compliant, and integrated into Linux and macOS via Fuse. It does not require any kernel modifications, enabling it to be a portable filesystem.

To start to explain the layers-first design, let us compare launching three containers that use Fedora. In the following diagram, the left side shows a snapshot based storage driver (like Device Mapper or Btrfs). Each container boots from its own read-write (rw) layer and read-only (init) layers. Good.

However, OS filesystems have been historically built to expect content to be read-writeable, often using snapshots of the first container’s init layer to create the second and third. One side effect becomes that almost all operations from the third container then must traverse the lower init layers of the first two containers’ layers. This leads to a slow down for nearly all file operations as more containers are launched, including reading from a file.

LCFS vs Snapshot driver diagram

In the above diagram, the right side shows that LCFS also presents a unified view (mount) for three containers running the Fedora image. The design goal is to unchain how containers access their own content. First, launching the second container results in a new init clone (not a snapshot). Internally, the access of the second container’s (init) filesystem does not require tracking backward to an original (snapshot’s) parent. The net effect is that read and modify operations from successive containers do not depend on prior containers.

Separately, LCFS itself is implemented as a single filesystem. It takes in (hardware) devices and puts one filesystem over those drives. For more on the LCFS architecture and future TODOs, please see:

  • layout: how LCFS formats devices, handles inodes, and handles internal data structures.
  • layers: how layers are created, locked, cloned/snapshotted and committed.
  • file operations: how LCFS supports file operations and file locking.
  • caching: how LCFS caches and accesses metadata (inodes, directories, etc) and future work.
  • space management: how LCFS handles allocation, tracking, placement, and I/O coalescing.
  • crash consistency: how LCFS deals with abnormal shutdowns (crashes etc.)
  • cli: Commands for various operations.
  • LCFS Design: LCFS design highlights.

Installing LCFS

You can install LCFS by executing the script below (assuming your storage device is /dev/sdb):

# curl -fsSL http://lcfs.portworx.com/lcfs-setup.sh | sudo DEV=/dev/sdb bash

For detailed instructions, you must first install the LCFS filesystem and then the LCFS v2 graph driver plugin, as described here.

Licensing

The Layer Cloning Filesystem (LCFS) is licensed under the Apache License, Version 2.0. See LICENSE for the full license text.

Contributing

Want to collaborate and add? Here are instructions to get started contributing code.

More Repositories

1

px-dev

PX-Developer is scale-out storage for containers. Run Cassandra, Jenkins, or any application in Docker, with enterprise storage functionality on commodity servers
Dockerfile
268
star
2

torpedo

A test suite to qualify storage providers for stateful containers running in a cluster.
Go
115
star
3

terraporx

Portworx Community Repo for Terraform
HCL
47
star
4

helm

Repository for Portworx Helm assets
Shell
46
star
5

kvdb

Generic Key-Value interface
Go
39
star
6

px-docs

View Portworx Documentation at
CSS
37
star
7

px-fuse

virtual block device interface to the userpace
C
29
star
8

fio-tools

Fio testing and graphing tools for docker and kubernetes
Python
24
star
9

sched-ops

Go
18
star
10

pxc

Portworx Client
Go
15
star
11

talisman

Talisman helps with upgrade and wipe of a Portworx cluster on Kubernetes
Go
13
star
12

overlayfs

C
12
star
13

kdmp

Kubernetes Data Management Platform
Go
10
star
14

katacoda-scenarios

Shell
9
star
15

velero-plugin

Portworx plugin for Velero
Go
7
star
16

cfssl-certs

generate ssl certs + keys for etcd and other clustered services using Cloudflare's cfssl tool
Shell
7
star
17

k8s-px-examples

6
star
18

rancher

Portworx Catalogs for Rancher
5
star
19

px-backup-api

Go
5
star
20

portworx-stop-bosh-release

Bosh release for stopping Portworx in PKS
Shell
3
star
21

px-preflight

Portworx pre-flight checks
Shell
3
star
22

pds-api-go-client

PDS OpenAPI go client
3
star
23

rancher-catalog

Smarty
2
star
24

gossip

Go implementation of the Gossip protocol
Go
2
star
25

px-log

Portworx logs colorizer
Python
2
star
26

terraform-portworx-portworx-instance

Terraform module for deploying Portworx persistent data fabric for containers
HCL
2
star
27

dcos-secrets

Go client for DC/OS secrets
Go
2
star
28

px-anthos-acm

1
star
29

ansible-portworx-etcd3

Deploy etcd3
JavaScript
1
star
30

px-lighthouse

1
star
31

px-docs-amp

This is the repository for the mobile friendly AMP version of Portworx documentation.
HTML
1
star
32

ansible-portworx-defaults

Ansible role for deploying Portworx
1
star
33

px-object-controller

Go
1
star
34

terraform-ibm-portworx-enterprise

Terraform Module to manage Portworx Enterprise on IKS in IBM Cloud
Shell
1
star
35

aws-helm

HELM Install chart for Portworx on the AWS Marketplace
Smarty
1
star
36

rh-multicloud-gitops-pxe

Red Hat Multi-cloud Gitops validated pattern - Portworx Enterprise
Python
1
star
37

pxdocs-tooling

Themes and build tooling for the Portworx docs site
JavaScript
1
star
38

px-preinstall-hook

Portworx etcd preinstall hook used in helm
Shell
1
star