• Stars
    star
    115
  • Rank 305,916 (Top 7 %)
  • Language
    Go
  • License
    Apache License 2.0
  • Created over 7 years ago
  • Updated about 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

A test suite to qualify storage providers for stateful containers running in a cluster.

Torpedo

Travis branch Go Report Card

Torpedo is a test suite to qualify storage providers for stateful containers running in a distributed environment. It tests various scenarios that applications encounter when running in Linux containers and deployed via schedulers such as Kubernetes, Marathon or Swarm.

Drawing

CSI

CSI is a specification for Linux Container Storage Interfaces. It defines the control plane interaction between a cloud native scheduler such as Kubernetes, and a cloud native storage provider. The specification is available here.

The Torpedo test suite natively supports the CSI specification for external volume support into Kubernetes and Mesosphere. It operates as a CSI enabled orchestrator (scheduler) to communicate with external storage providers that support CSI.

Torpedo tests cover the various scheduler-storage integration points that are being addressed by the CSI specification (https://docs.google.com/document/d/1JMNVNP-ZHz8cGlnqckOnpJmHF-DNY7IYP-Di7iuVhQI/edit#) and how external volume providers like Portworx are able to support production level operational scenarios when it comes to storage, server, software or network failures.

Legacy support

Since CSI is currently still work in progress, most schedulers provide external volume support to Mesosphere or Kubernetes via DVDI or the Kubernetes native driver interface.

Docker volume driver interface (DVDI) provides the control path operations to create, mount, unmount and eventually delete an external volume and is documented here.

In order to support legacy storage drivers, Torpedo can also work with schedulers that still use the Docker volume driver interface.

Scenarios to Consider when Deploying Stateful Applications

Deploying ephemeral applications require less consideration when compared to stateful applications. When running stateful applications in production, administrators should take into account various runtime scenarios that may occur and chose an external storage provider that is capable of dealing with these situations. Examples of these scenarios are:

Runtime software hangs and crashes

  • Container runtime engine (or scheduler) software failure, hang or crash: When a daemon, like Docker crashes, it can induce errors with an application's connectivity to the external storage. This problem is compounded when the storage provider itself, runs as a container. In general, you need to assume that user space code will either hang or crash, and the storage system needs to gracefully deal with this, without data loss, unavailability or corruption.
  • External storage driver software failure, hang or crash: When the storage software itself crashes, the overall solution needs to make sure that there are no lost IOs (data loss), unavailability or corruption.

Network and host issues

  • Network disconnect from the storage provider to the external environment: If a node on which the storage volume driver is running were to become disconnected from the network, the overall solution needs to make sure that the volume can be used on another node, and that there is no data loss or corruption.
  • A node running a stateful application becomes permanently (or for a prolonged period of time) unreachable: In many cases, a node can become permanently unusable. In cases, such as AWS, when an EBS volume is attached to such a node, the overall solution needs to make sure that the volume or the data can somehow still be used on some other node in the cluster.
  • A network partition in the cluster: When the scheduler cluster or the storage cluster gets partitioned in such a way that quorum is lost, the nodes that are still part of the quorum need to be able to use all of the data that was in the original cluster. Otherwise, this would lead to data unavailability.

Scheduler software issues

  • Scheduler software attempts to deploy a stateful container on a node that is not part of the storage cluster: It is possible that the storage cluster and the scheduler cluster do not comprise of the same machines. The overall solution must prevent, or somehow make sure that when a stateful application is deployed on a non-storage node, that the application's storage requirements are fulfilled. Some approaches to handle this include the use of scheduler constraints and labels.
  • Scheduler software attempts to bring up a new container/pod/task to use a storage volume prior to properly terminating the previous container/pod/task on a different host: Scheduler software, perhaps due to bugs or timing issues, may launch a new application stack on a new set of nodes that refer to a volume currently in use by an application stack being torn down. The overall solution must be capable of dealing with these transition scenarios, without application data loss or corruption.

Test Cases Run by Torpedo

Test/Scenario Acceptance vs Runtime Test Expected Result
Create dynamic volumes Runtime Expected to be able to create a volume with arbitrary parameters at runtime
Verify that the volume driver can deal with an uneven number of mounts and unmounts and allow the volume to get mounted on another node. Runtime Expected to pass
Volume Driver Plugin is down, unavailable - and the client container should not be impacted. Acceptance Client container does not get an IO error.
Volume driver plugin is down and the client container gets terminated. There is a lost unmount call in this case, but the container should be able to come up on another system and use the volume. Acceptance Expected to pass.
A container is using a volume on node X. Node X is now powered off. Acceptance The system must be able to create a new container on node Y and use the same volume using pod replace.
Storage plugin is down. Scheduler tries to create a container using the provider’s volume. Acceptance This should fail.,The container should not start and the scheduler should receive an error.
A container is running on node X. Node X looses network access and is partitioned away. Node Y that is in the cluster can use the volume for another container. Acceptance When node X re-joins the network and hence joins the cluster, it is expected that the application that is running will get I/O errors since the block volume is attached on another node.
A container is running on node X. Node X can only see a subset of the storage cluster. That is, it can see the entire DC/OS cluster, but just the storage cluster gets a network partition. Node Y that is in the cluster can use the volume for another container. Acceptance When node X re-joins the storage network and hence joins the cluster, it is expected that the application that is running will get I/O errors since the block volume is attached on another node.
Docker daemon crashes and live restore is disabled. Acceptance The agent detects that the task has died and it brings it up on another node and the task can re-use the volume.
Docker daemon crashes and live restore is enabled. This scenario should be a noop. Container does not crash. Acceptance Expected to pass

Qualified External Storage Providers

To submit an external storage provider, please submit a PR with the output of the Torpedo test program and the specifics of the environment used.

Provider Information Test Coverage Status

Usage

Build

See How to build.

Run

See How to run.

Contributing

The specification and code is licensed under the Apache 2.0 license found in the LICENSE file of this repository.

See the Style Guide.

Sign your work

The sign-off is a simple line at the end of the explanation for the patch, which certifies that you wrote it or otherwise have the right to pass it on as an open-source patch. The rules are pretty simple: if you can certify the below (from developercertificate.org):

Developer Certificate of Origin
Version 1.1

Copyright (C) 2004, 2006 The Linux Foundation and its contributors.
660 York Street, Suite 102,
San Francisco, CA 94110 USA

Everyone is permitted to copy and distribute verbatim copies of this
license document, but changing it is not allowed.


Developer's Certificate of Origin 1.1

By making a contribution to this project, I certify that:

(a) The contribution was created in whole or in part by me and I
    have the right to submit it under the open source license
    indicated in the file; or

(b) The contribution is based upon previous work that, to the best
    of my knowledge, is covered under an appropriate open source
    license and I have the right under that license to submit that
    work with modifications, whether created in whole or in part
    by me, under the same open source license (unless I am
    permitted to submit under a different license), as indicated
    in the file; or

(c) The contribution was provided directly to me by some other
    person who certified (a), (b) or (c) and I have not modified
    it.

(d) I understand and agree that this project and the contribution
    are public and that a record of the contribution (including all
    personal information I submit with it, including my sign-off) is
    maintained indefinitely and may be redistributed consistent with
    this project or the open source license(s) involved.

then you just add a line to every git commit message:

Signed-off-by: Joe Smith <[email protected]>

using your real name (sorry, no pseudonyms or anonymous contributions.)

You can add the sign off when creating the git commit via git commit -s.

More Repositories

1

lcfs

LCFS Graph driver for Docker
C
299
star
2

px-dev

PX-Developer is scale-out storage for containers. Run Cassandra, Jenkins, or any application in Docker, with enterprise storage functionality on commodity servers
Dockerfile
268
star
3

helm

Repository for Portworx Helm assets
Shell
48
star
4

terraporx

Portworx Community Repo for Terraform
HCL
47
star
5

kvdb

Generic Key-Value interface
Go
39
star
6

px-docs

View Portworx Documentation at
CSS
37
star
7

px-fuse

virtual block device interface to the userpace
C
29
star
8

fio-tools

Fio testing and graphing tools for docker and kubernetes
Python
24
star
9

sched-ops

Go
18
star
10

pxc

Portworx Client
Go
16
star
11

talisman

Talisman helps with upgrade and wipe of a Portworx cluster on Kubernetes
Go
13
star
12

overlayfs

C
12
star
13

kdmp

Kubernetes Data Management Platform
Go
10
star
14

katacoda-scenarios

Shell
9
star
15

velero-plugin

Portworx plugin for Velero
Go
7
star
16

cfssl-certs

generate ssl certs + keys for etcd and other clustered services using Cloudflare's cfssl tool
Shell
7
star
17

k8s-px-examples

6
star
18

rancher

Portworx Catalogs for Rancher
5
star
19

px-backup-api

Go
5
star
20

portworx-stop-bosh-release

Bosh release for stopping Portworx in PKS
Shell
3
star
21

pds-api-go-client

PDS OpenAPI go client
3
star
22

gossip

Go implementation of the Gossip protocol
Go
2
star
23

px-log

Portworx logs colorizer
Python
2
star
24

terraform-portworx-portworx-instance

Terraform module for deploying Portworx persistent data fabric for containers
HCL
2
star
25

dcos-secrets

Go client for DC/OS secrets
Go
2
star
26

px-anthos-acm

1
star
27

ansible-portworx-etcd3

Deploy etcd3
JavaScript
1
star
28

px-lighthouse

1
star
29

px-docs-amp

This is the repository for the mobile friendly AMP version of Portworx documentation.
HTML
1
star
30

ansible-portworx-defaults

Ansible role for deploying Portworx
1
star
31

px-object-controller

Go
1
star
32

terraform-ibm-portworx-enterprise

Terraform Module to manage Portworx Enterprise on IKS in IBM Cloud
Shell
1
star
33

aws-helm

HELM Install chart for Portworx on the AWS Marketplace
Smarty
1
star
34

rh-multicloud-gitops-pxe

Red Hat Multi-cloud Gitops validated pattern - Portworx Enterprise
Python
1
star
35

pxdocs-tooling

Themes and build tooling for the Portworx docs site
JavaScript
1
star
36

px-preinstall-hook

Portworx etcd preinstall hook used in helm
Shell
1
star