• Stars
    star
    712
  • Rank 63,132 (Top 2 %)
  • Language
    Rust
  • License
    Apache License 2.0
  • Created about 5 years ago
  • Updated 23 days ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Dynamically provision Stateful Persistent Replicated Cluster-wide Fabric Volumes & Filesystems for Kubernetes that is provisioned from an optimized NVME SPDK backend data storage stack.

Mayastor

Releases CI-basic Slack built with nix

Table of contents:

Mayastor is a cloud-native declarative data plane written in Rust. Our goal is to abstract storage resources and their differences through the data plane such that users only need to supply the what and do not have to worry about the how so that individual teams stay in control.

We also try to be as unopinionated as possible. What this means is that we try to work with the existing storage systems you might already have and unify them as abstract resources instead of swapping them out whenever the resources are local or remote.

Some targeted use cases are:

  • Low latency workloads for converged and segregated storage by leveraging NVMe/NVMe over Fabrics (NVMe-oF)
  • Micro-VM based containers like Firecracker microVMs and Kata Containers by providing storage over vhost-user
  • Programmatic based storage access, i.e write to block devices from within your application instead of making system calls
  • Storage unification to lift barriers so that you can start deploying cloud native apps on your existing storage without painful data gravity barriers that prevent progress and innovation

User Documentation

The official user documentation for the Mayastor Project is published here in GitBook format: mayastor.gitbook.io

Overview

At a high-level, Mayastor consists of two major components.

Control plane:

  • A microservices patterned control plane, centered around a core agent which publicly exposes a RESTful API. This is extended by a dedicated operator responsible for managing the life cycle of "Mayastor Pools" (an abstraction for devices supplying the cluster with persistent backing storage) and a CSI compliant external provisioner (controller). Source code for the control plane components is located in its own repository

  • A per node instance mayastor-csi plugin which implements the identity and node grpc services from CSI protocol.

Data plane:

  • Each node you wish to use for storage or storage services will have to run a Mayastor daemon set. Mayastor itself has three major components: the Nexus, a local storage component, and the mayastor-csi plugin.

Nexus

The Nexus is responsible for attaching to your storage resources and making it available to the host that is selected to run your k8s workload. We call these from the Nexus' point of view its "children".

The goal we envision the Nexus to provide here, as it sits between the storage systems and PVCs, is loose coupling.

A practical example: Once you are up and running with persistent workloads in a container, you need to move your data because the storage system that stores your PVC goes EOL. You now can control how this impacts your team without getting into storage migration projects, which are always painful and complicated. In reality, the individual storage volumes per team/app are relatively small, but today it is not possible for individual teams to handle their own storage needs. The Nexus provides the abstraction over the resources such that the developer teams stay in control.

The reason we think this can work is because applications have changed, and the way they are built allows us to rethink they way we do things. Moreover, due to hardware changes we in fact are forced to think about it.

Based on storage URIs the Nexus knows how to connect to the resources and will make these resources available as a single device to a protocol standard protocol. These storage URIs are generated automatically by MOAC and it keeps track of what resources belong to what Nexus instance and subsequently to what PVC.

You can also directly use the nexus from within your application code. For example:

use io_engine::descriptor::{Descriptor, DmaBuf};
use io_engine::bdev::nexus::nexus_bdev::nexus_create;

let children = vec![
      "aio:////disk1.img?blk_size=512".to_string(),
      // it is assumed these hosts are reachable over the network
      "nvmf://fooo/nqn.2019-05.io-openebs:disk0".into(),
      "nvmf://barr/nqn.2019-05.io-openebs:disk0".into()
];

// if no UUID given, one will be generated for you
let uuid = "b6565df-af19-4645-9f98-e6a8b8c13b58".to_string();

// create the nexus using the vector of child devices
let nexus = nexus_create("mynexus", 4096, 131_027, Some(uuid),  &children).await.unwrap();

// open a block descriptor
let bd = Descriptor::open(&nexus, true).unwrap();

// only use DMA buffers to issue IO, as its a member of the opened device
// alignment is handled implicitly
let mut buf = bd.dma_zmalloc(4096).unwrap();

// fill the buffer with a know value
buf.fill(0xff);

// write out the buffer to the nexus, all child devices will receive the
// same IO. Put differently. A single IO becomes three IOs
bd.write_at(0, &mut buf).await.unwrap();

// fill the buffer with zeros and read back the data
buf.fill(0x00);
bd.read_at(0, &mut buf).await.unwrap();

// verify that the buffer is filled with what wrote previously
buf.as_slice().into_iter().map(|b| assert_eq!(b, 0xff)).for_each(drop);

We think this can help a lot of database projects as well, where they typically have all the smarts in their database engine and they want the most simple (but fast) storage device. For a more elaborate example see some of the tests in mayastor/tests.

To communicate with the children, the Nexus uses industry standard protocols. The Nexus supports direct access to local storage and remote storage using NVMe-oF TCP. Another advantage of the implementation is that if you were to remove the Nexus from the data path, you would still be able to access your data as if Mayastor was not there.

The Nexus itself does not store any data and in its most simplistic form the Nexus is a proxy towards real storage devices where the transport may vary. It can however, as mentioned, "transform" the data, which makes it possible to store copies of your data within different cloud systems. One of the other ideas we have is to write a block device on top of a S3 bucket such that you can create PVCs from Minio, AWS or any other compatible S3 bucket. This simplifies the replication model for the Nexus itself somewhat but creates a bit more on the buffering side of things. What model fits best for you? You get to decide!


Local storage

If you do not have a storage system, and just have local storage, i.e block devices attached to your system, we can consume these and make a "storage system" out of these local devices such that you can leverage features like snapshots, clones, thin provisioning, and the likes. Our K8s tutorial does that under the water today. Currently, we are working on exporting your local storage implicitly when needed, such that you can share storage between nodes. This means that your application, when re-scheduled, can still connect to your local storage except for the fact that it is not local anymore.

Similarly, if you do not want to use anything other than local storage, you can still use Mayastor to provide you with additional functionality that otherwise would require you setup kernel specific features like LVM for example.

Exporting the Nexus

The primary focus of development is using NVMe as a transport protocol. The Nexus uses NVMe-oF to replicate a volume's data to multiple devices on multiple nodes (if required).

Client

Although a client for gRPC server is not required for the product, it is important for testing and troubleshooting. The client allows you to manage storage pools and replicas and just use `--help` option if you are not sure how to use it. CSI services are not covered by the client.

In following example of a client session is assumed that mayastor has been started and is running:

$ dd if=/dev/zero of=/tmp/disk bs=1024 count=102400
102400+0 records in
102400+0 records out
104857600 bytes (105 MB, 100 MiB) copied, 0.235195 s, 446 MB/s
$ sudo losetup /dev/loop8 /tmp/disk
$ io-engine-client pool create tpool /dev/loop8
$ io-engine-client pool list
NAME                 STATE        CAPACITY         USED   DISKS
tpool                0            96.0 MiB          0 B   tpool
$ io-engine-client replica create tpool replica1 --size=10
$ io-engine-client replica create tpool replica2 --size=1000 --thin
$ io-engine-client replica list
POOL                 NAME                 THIN           SIZE
tpool                replica1             false       10.0 MiB
tpool                replica2             true         1.0 GiB
$ io-engine-client replica destroy tpool replica1
$ io-engine-client replica destroy tpool replica2
$ io-engine-client replica list
No replicas have been created
$ io-engine-client pool destroy tpool

Links

License

Mayastor is developed under Apache 2.0 license at the project level. Some components of the project are derived from other open source projects and are distributed under their respective licenses.

http://www.apache.org/licenses/LICENSE-2.0

Contributions

Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in Mayastor by you, as defined in the Apache-2.0 license, licensed as above, without any additional terms or conditions.

More Repositories

1

openebs

Most popular & widely deployed Open Source Container Native Storage platform for Stateful Persistent Applications on Kubernetes.
8,910
star
2

zfs-localpv

Dynamically provision Stateful Persistent Node-Local Volumes & Filesystems for Kubernetes that is integrated with a backend ZFS data storage stack.
Go
411
star
3

lvm-localpv

Dynamically provision Stateful Persistent Node-Local Volumes & Filesystems for Kubernetes that is integrated with a backend LVM2 data storage stack.
Go
243
star
4

maya

Manage Container Attached Storage (CAS) - Data Engines in Kubernetes
Go
187
star
5

node-disk-manager

Kubernetes Storage Device Management
Go
181
star
6

dynamic-nfs-provisioner

Operator for dynamically provisioning an NFS server on any Kubernetes Persistent Volume. Also creates an NFS volume on the dynamically provisioned server for enabling Kubernetes RWX volumes.
Go
150
star
7

jiva

CAS Data Engine - iSCSI Distributed Block Storage Controller built-in Go
Go
139
star
8

dynamic-localpv-provisioner

Dynamically deploy Stateful Persistent Node-Local Volumes & Filesystems for Kubernetes that is provisioned from simple Local-Hostpath /root storage.
Go
137
star
9

charts

OpenEBS Helm Charts and other utilities
Mustache
100
star
10

cstor-operators

Collection of OpenEBS cStor Data Engine Operators
Go
91
star
11

rawfile-localpv

Dynamically deploy Stateful Persistent Node-Local Volumes & Filesystems for Kubernetes that is provisioned from RAW-device file loop mounted Local-Hostpath storage.
Python
67
star
12

velero-plugin

Velero plugin for backup/restore of OpenEBS cStor volumes
Go
61
star
13

spdk-rs

Enables building safer SPDK-based Rust applications
Rust
53
star
14

jiva-operator

Kubernetes Operator for managing Jiva Volumes via custom resource.
Go
47
star
15

openebs-docs

OpenEBS Documentation
JavaScript
37
star
16

mayastor-control-plane

Control plane for OpenEBS Mayastor
Rust
33
star
17

cstor-csi

cStor CSI Driver
Go
32
star
18

monitoring

OpenEBS Monitoring add-on. A set of Grafana, Prometheus, and alert manager plugins.
Jsonnet
30
star
19

openebsctl

`openebsctl` is a kubectl plugin to manage OpenEBS storage components.
Go
28
star
20

device-localpv

CSI Driver for using Local Block Devices
Go
24
star
21

istgt

CAS Data Engine - iSCSI Target for OpenEBS cStor
C
23
star
22

vhost-user

vhost for containerised storage
C
22
star
23

mayastor-extensions

Components and utilities which extend the Mayastor core control & data plane functionality
Rust
18
star
24

libcstor

CAS Data Engine - Library to serve IOs on uZFS with synchronous replication, snapshots and clones
C
18
star
25

performance-benchmark

Performance benchmarking for containerised storage solutions
Shell
17
star
26

elves

Helpers of OpenEBS
Python
16
star
27

website

OpenEBS Website and User Documentation
TypeScript
13
star
28

spdk-sys

Rust bindings for SPDK
Rust
12
star
29

upgrade

contains components that help with OpenEBS data engine upgrades
Go
10
star
30

e2e-tests

E2e tests for OpenEBS. The tests are run on various platforms and results can be seen at https://openebs.ci
Jinja
10
star
31

community-archive

Please refer to https://github.com/openebs/openebs for community updates.
Shell
9
star
32

api

The canonical location of the OpenEBS API definition.
Go
7
star
33

mayastor-docs

Official GitBook-based documentation for OpenEBS Mayastor
JavaScript
6
star
34

helm-operator

Helm Operator for OpenEBS Installation
Makefile
6
star
35

mayastor-dependencies

MayaData Dependencies
Rust
6
star
36

linux-utils

OpenEBS apline based docker images with linux utilities used for launching helper jobs.
Makefile
5
star
37

lib-csi

common packages used by OpenEBS CSI Drivers
Go
4
star
38

mayastor-api

Nix
4
star
39

monitor-pv

custom stats collector for OpenEBS persistent volumes
Shell
4
star
40

data-populator

data populator
Go
3
star
41

community

OpenEBS community resources
3
star
42

jiva-csi

CSI Driver for OpenEBS Jiva Volumes
Go
2
star
43

sts-pv-pvc-handler

Go
2
star
44

moac

The moac control plane for OpenEBS Mayastor has been deprecated as of release v1.0
JavaScript
2
star
45

mayastor-charts

Shell
1
star
46

m-exporter

Go
1
star
47

openebs-k8s-provisioner

Kubernetes external provisioner for OpenEBS cStor and Jiva Volume.
Go
1
star
48

.github

Top level project User Experience repo
1
star
49

google-analytics-4

Google analytics version 4 client for OpenEBS engines
Go
1
star