• Stars
    star
    142
  • Rank 258,495 (Top 6 %)
  • Language
    HTML
  • License
    Apache License 2.0
  • Created over 10 years ago
  • Updated over 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Diego Architectural Design Musings and Explications

Diego Design Notes

These are design notes intended to convey how the various components of Diego communicate and interrelate. It is not comprehensive and is generally up-to-date, although not guaranteed to be. If you find something that you suspect is not up-to-date, please open an issue on this repository.

Migrating to Diego

We've put together some guidelines around transitioning applications off of the DEAs and on to Diego. One reason to move your apps to Diego is to try out SSH access to your CF app instances and Diego LRPs.

What does Diego do?

Diego schedules and runs Tasks and Long-Running Processes:

  • A Task is guaranteed to be run at most once.

  • A Long-Running Process (LRP) may have multiple instances. Diego is told of the desired LRPs. Each desired LRP may desire multiple instances, which Diego represents as actual LRPs. Diego attempts to keep the correct number of instances running in the face of network failures and crashes.

Clients submit, update, and retrieve Tasks and LRPs to the BBS (Bulletin Board System) via an RPC-style API over HTTP. Diego's Auctioneer optimally distributes Tasks and LRPs to the cluster of Diego Cells via an Auction that queries and then sends work to the Cell Reps. Once the auction assigns a Task or LRP to a Cell, the Executor creates a Garden container and executes the work encoded in the Task/LRP. This work is encoded as a generic, platform-independent recipe of composable actions.

The BBS also provides a real-time representation of the state of the Diego cluster (including all desired LRPs, running LRP instances, and in-flight Tasks). The Converger periodically analyzes snapshots of this representation and corrects discrepancies, ensuring that Diego is eventually consistent.

Diego sends real-time streaming logs for Tasks/LRPs to the Loggregator system. Diego also registers its running LRP instances with the Gorouter to route external web traffic to them.

Diego is the next-generation runtime powering Cloud Foundry (CF), but Diego is abstracted away from CF: CF simply acts as another Diego client via the BBS API. For now, there is a translation layer called the CC-Bridge that converts the Cloud Controller's domain-specific requests to stage and run applications into requests for Tasks and LRPs. Eventually Cloud Controller will be modified to communicate directly with the BBS. The process of staging and running a CF application is complex and filled with platform and implementation-specific details. A collection of binaries known collectively as the App Lifecycle encapsulate these concerns. The Tasks and LRPs produced by the CC-Bridge download the App Lifecycle binaries and execute them to stage, to run, and to health-check CF applications.

CF Summit Talks on Diego

What are all these repos and what do they do?

Below is a diagrammatic overview of the major repositories and components in Diego and CF (also PDFclickable map).

Diego Overview

Components in the blue region are part of the Diego core and handle the running and monitoring of Tasks and LRPs. These components all come from the Diego BOSH release.

Components in the yellow region provide infrastructure support to Diego and CF components. At the moment, this primarily includes Consul for DNS-based dynamic service discovery and a consistent key-value store for distributed locks and component discovery.

Components in the orange region support routing HTTP traffic to Diego containers. This includes the Route-Emitter from Diego and the Gorouter from CF.

Components in the red region support log and metric aggregation from Diego containers and CF and Diego components.

The green region brings in Cloud Controller and the CC-Bridge. As the diagram shows, the CC-Bridge merely interfaces with the BBS, translating app-specific messages from the CC to the more generic language of Tasks and LRPs.

The following summarizes the roles and responsibilities of the various components in this diagram.

"User-facing" Components

These "user-facing" components all live in cf-release:

  • Cloud Controller (CC):
    • provides an API for staging and running apps and provisioning and binding services to them,
    • organizes apps and services into a hierarchy with role-based access control suitable for a multi-tenant platform.
  • Loggregator:
    • Doppler aggregates app logs and component and container metrics relayed through the local Metron agents.
    • Traffic Controller retrieves logs and metrics from Doppler for end users, enforcing access based on CC roles
  • Gorouter:
    • routes incoming HTTP traffic to processes within the CF/Diego deployment
    • this includes routing traffic to both developer apps running within Garden containers and CF components such as CC.

Developers typically interact with CC and the logging system through a client such as the CF CLI.

CC-Bridge Components

The CC-Bridge components interact with the Cloud Controller. They serve primarily to translate app-specific notions into the more general notions of LRPs and Tasks:

  • Stager:
    • receives staging requests from CC, translates them into Diego Tasks, and submits those Tasks to the BBS
    • sends a response to CC when a staging Task is completed, successfully or otherwise.
  • CC-Uploader:
    • mediates staging uploads from the Executor to CC, translating the Executor's simple HTTP POST into the complex multipart-form upload CC requires.
  • Nsync splits its responsibilities between two independent processes:
    • The nsync-listener listens for desired app requests and updates/creates the desired LRPs via the BBS.
    • The nsync-bulker periodically polls CC for all desired apps to ensure the desired state known to Diego is up-to-date.
  • TPS also splits its responsibilities between two independent processes:
    • The tps-listener provides the CC with information about running LRP instances for cf apps and cf app X requests.
    • The tps-watcher monitors ActualLRP activity for crashes and reports them to CC.

Many of the CC-Bridge components are inherently stateless and will eventually be consolidated into Cloud Controller itself.

Components on the Database VMs

The Database VMs provide Diego's core components and clients a consistent API to the shared state and operations that manage Tasks and LRPs, as well as the data store for that shared state.

  • BBS:
    • provides an RPC-style API over HTTP to both core Diego components (rep, auctioneer, converger) and external clients (CC-Bridge, route emitter, SSH proxy),
    • encapsulates access to the backing database and manages data migrations, encoding, and encryption,
    • performs LRP convergence periodically, comparing DesiredLRPs and their ActualLRPs and taking action to enforce the desired state:
      • if an instance is missing or unclaimed for too long, it a new auction is requested.
      • if an extra instance is identified, a stop message is sent to the Rep on the Cell hosting the instance.
    • performs Task converence periodically, resending auction requests for Tasks that have been pending for too long and completion callbacks for Tasks that have remained completed for too long,
    • periodically sends aggregate metrics about DesiredLRPs, ActualLRPs, and Tasks to Loggregator,
    • maintains a lock in consul to ensure only one BBS handles requests, migrations, and convergence at a time.

The BBS requires a backing persistent data store. MySQL and PostgreSQL are supported on current versions, and historically etcd was supported through Diego v1.0.

Components on the Cell

These Diego components run and monitor Tasks and LRPs in Garden containers:

  • Rep:
    • maintains a presence record for the Cell in the BBS,
    • participates in auctions to accept new Tasks and LRP instances,
    • runs Tasks and LRPs by telling its in-process Executor to create a container and then to run actions in it,
    • reacts to container events coming from the Executor,
    • periodically ensures its set of Tasks and ActualLRPs in the BBS is in sync with the containers actually present on the Cell.
  • Executor (now a logical process running inside the Rep):
    • manages container allocations against resource constraints on the Cell, such as memory and disk space,
    • implements the actions detailed in the API documentation,
    • streams stdout and stderr from container processes to the metron-agent running on the Cell, which in turn forwards to the Loggregator system,
    • periodically collects container metrics and emits them to Loggregator.
  • Garden
    • provides a platform-independent server and client to manage garden containers,
    • defines an interface to be implemented by container-runners, such as guardian and garden-windows.
  • Metron
    • forwards application logs and application and component metrics to doppler

Note that there is a specificity gradient across the Rep, the Executor, and Garden. The Rep is concerned with Tasks and LRPs and knows details about their lifecycles. The Executor knows only how to manage a collection of containers and to run actions in these containers. Garden knows nothing about actions and simply provides a concrete implementation of a platform-specific containerization technology that can run arbitrary commands in containers.

Components on the Brain

  • Auctioneer
    • holds auctions for Tasks and LRP instances.
    • runs auctions using the auction package. Auction communication goes over HTTP and is between the Auctioneer and the Cell Reps.
    • maintains a lock in consul to ensure only one auctioneer handles auctions at a time.

Components on the Access VMs

  • File-Server
    • serves static assets used by our various components, such as the App Lifecycle binaries (see below).
  • SSH Proxy
    • brokers connections between SSH clients and SSH servers running inside instance containers,
    • authorizes access to CF app instances based on Cloud Controller roles.

Routing Translation Components

  • Route-Emitter
    • monitors DesiredLRP state and ActualLRP state via the BBS. When a change is detected, the Route-Emitter emits route registration and unregistration messages to the gorouter via the NATS message bus,
    • periodically emits the entire routing table to the router,
    • maintains a lock in consul to ensure only one route-emitter handles route registration at a time.

Service Registration and Component Coordination

  • Consul:
    • provides dynamic service registration and load-balancing via DNS resolution,
    • provides a consistent key-value store for maintenance of distributed locks and component presence.
  • Locket:
    • provides abstractions for locks and service registration that encapsulate interactions with consul.

Platform-Specific Components

Diego is largely platform-agnostic. All platform-specific concerns are delegated to two types of components: the garden backends and the app lifecycles.

Garden Backends

Garden contains a set of interfaces each platform-specific backend must implement. These interfaces contain methods to perform the following actions:

  • create/delete containers
  • apply resource limits to containers
  • open and attach network ports to containers
  • copy files into/out of containers
  • run processes within containers, streaming back stdout and stderr data
  • annotate containers with arbitrary metadata
  • snapshot containers for down-timeless redeploys

Current implementations:

App Lifecycles

Each App Lifecycle provides a set of binaries that manage a Cloud Foundry-specific application lifecycle. There are three binaries:

  • The Builder stages a CF application. The CC-Bridge runs the Builder as a Task on every staging request. The Builder perfoms static analysis on the application code and does any necessary pre-processing before the application is first run.
  • The Launcher runs a CF application. The CC-Bridge sets the Launcher as the Action on the CF application's DesiredLRP. The Launcher executes the user's start command with the correct system context (working directory, environment variables, etc.).
  • The Healthcheck performs a status check of a running CF application from inside the container. The CC-Bridge sets the Healthcheck as the Monitor action on the CF application's DesiredLRP.

Current implementations:

Bringing it all together

CF and Diego consist of many disparate components. Ensuring that these components work together correctly is a challenge addressed by these entities:

  • Inigo:
    • is an integration test suite that launches the various Diego components and exercises them through various test cases. As such, Inigo validates that a given set of component versions are mutually compatible.
    • in addition to exercising various ordinary test cases, Inigo can exercise exceptional cases, such as when a component fails or is unavailable for a period, that would be more difficult to orchestrate against a BOSH-deployed Diego cluster.
  • Vizzini:
    • is a suite of acceptance-level tests that run against a deployment of Diego with consul and routing components from CF,
    • interacts directly with the BBS API to run the tests,
    • ensures that Diego executes work and recovers from failure quickly by placing stringent timing requirements on many of the tests.
  • CF Acceptance Tests:
    • is a suite of acceptance-level tests that run against CF and Diego deployed together,
    • uses the CF CLI to run the tests.
  • Auction:
    • encodes the behavioral details around the auction.
    • includes a simulation test suite that validates the correctness and performance of the auction algorithm. The simulation can be run for different algorithms, at different scales. The simulation can either be run in-process (for quick feedback loops) or across multiple processes (to understand the role of communication in the auction) or even across multiple machines in a cloud-like infrastructure (to understand the impact of latency on the auction).
    • the auctioneer and rep use the auction package to participate in the auction.

The BOSH Release

Diego-Release packages Diego as a BOSH release. Its README includes detailed instructions for deploying CF and Diego to a local BOSH-Lite.

Diego-Release is also the canonical GOPATH for the Diego. All Diego development takes place inside the Diego-Release directory.

More Repositories

1

bosh

Cloud Foundry BOSH is an open source tool chain for release engineering, deployment and lifecycle management of large scale distributed services.
Ruby
2,010
star
2

cli

The official command line client for Cloud Foundry
Go
1,733
star
3

uaa

CloudFoundry User Account and Authentication (UAA) Server
Java
1,541
star
4

java-buildpack-memory-calculator

Cloud Foundry JVM Memory Calculator
Go
602
star
5

gosigar

A Golang implementation of the Sigar API
Go
453
star
6

gorouter

CF Router
Go
429
star
7

java-buildpack

Cloud Foundry buildpack for running Java applications
Ruby
425
star
8

go-diodes

Diodes are ring buffers manipulated via atomics.
Go
411
star
9

cf-java-client

Java Client Library for Cloud Foundry
Java
318
star
10

korifi

Cloud Foundry on Kubernetes
Go
301
star
11

cf-for-k8s

The open source deployment manifest for Cloud Foundry on Kubernetes
Shell
301
star
12

cf-deployment

The canonical open source deployment manifest for Cloud Foundry
Go
279
star
13

stratos

Stratos: Web-based Management UI for Cloud Foundry and Kubernetes
TypeScript
241
star
14

credhub

CredHub centralizes and secures credential generation, storage, lifecycle management, and access
Java
225
star
15

garden

Go Warden
Go
223
star
16

java-buildpack-auto-reconfiguration

Auto-reconfiguration functionality for the Java Buildpack
Java
219
star
17

loggregator-release

Cloud Native Logging
Go
217
star
18

bytefmt

Human readable byte formatter
Go
208
star
19

diego-release

BOSH Release for Diego
HTML
199
star
20

staticfile-buildpack

Deploy static HTML/JS/CSS apps to Cloud Foundry
Go
199
star
21

cloud_controller_ng

Cloud Foundry Cloud Controller
Ruby
181
star
22

bosh-bootloader

Command line utility for standing up a BOSH director on an IAAS of your choice.
Go
176
star
23

bosh-cli

BOSH CLI v2+
Go
174
star
24

nodejs-buildpack

Cloud Foundry buildpack for Node.js
Go
161
star
25

php-buildpack

A Cloud Foundry Buildpack for PHP.
Python
142
star
26

bosh-deployment

Collection of BOSH manifests referenced by cloudfoundry/docs-bosh
Shell
125
star
27

python-buildpack

Cloud Foundry buildpack for the Python Language
Go
118
star
28

eirini

Pluggable container orchestration for Cloud Foundry, and a Kubernetes backend
Go
115
star
29

cloud-service-broker

OSBAPI service broker that uses Terraform to provision and bind services. Derived from https://github.com/GoogleCloudPlatform/gcp-service-broker
Go
81
star
30

go-buildpack

Cloud Foundry buildpack for the Go Language
Go
80
star
31

multiapps-cli-plugin

A CLI plugin for Multi-Target Application (MTA) operations in Cloud Foundry
Go
77
star
32

guardian

containers4life
Go
75
star
33

lager

An opinionated logger for Go.
Go
73
star
34

app-autoscaler

Auto Scaling for CF Applications
Go
73
star
35

ibm-websphere-liberty-buildpack

IBM WebSphere Application Server Liberty Buildpack
Ruby
71
star
36

summit-training-classes

Opensourced content for cloud foundry training classes: zero to hero (beginner), bosh/operator, and microservices
JavaScript
69
star
37

cf-acceptance-tests

CF Acceptance tests
Go
68
star
38

cf-networking-release

Container Networking for CloudFoundry
Go
68
star
39

ruby-buildpack

Cloud Foundry buildpack for Ruby, Sinatra and Rails
Go
63
star
40

garden-runc-release

Shell
63
star
41

bosh-google-cpi-release

BOSH Google CPI
Go
62
star
42

bosh-azure-cpi-release

BOSH Azure CPI
Ruby
61
star
43

loggregator

Archived: Now bundled in https://github.com/cloudfoundry/loggregator-release
Go
60
star
44

cf-mysql-release

Cloud Foundry MySQL Release
Go
58
star
45

go-pubsub

Tree based pubsub library for Go.
Go
56
star
46

bosh-agent

BOSH Agent runs on each BOSH deployed VM
Go
56
star
47

docs-book-cloudfoundry

The bookbinder repository for open source Cloud Foundry documentation
HTML
55
star
48

homebrew-tap

Cloud Foundry Homebrew packages
Ruby
53
star
49

multiapps-controller

The server side component (controller) for Multi-Target Application (MTA) for Cloud Foundry
Java
52
star
50

socks5-proxy

This is a go library for starting a socks5 proxy server via SSH
Go
44
star
51

cf-uaac

Ruby
41
star
52

docs-cloudfoundry-concepts

A place for architecture and concept docs
HTML
41
star
53

buildpacks-ci

Concourse CI pipelines for the buildpacks team
HTML
41
star
54

service-fabrik-broker

Cloud Foundry service broker which provisions service instances as Docker containers and BOSH deployments.
JavaScript
40
star
55

grootfs

Garden root file system
Go
40
star
56

routing-release

This is the BOSH release for cloud foundry routers
Ruby
39
star
57

docs-dev-guide

Documentation for application developers who want to deploy their applications to Cloud Foundry
HTML
39
star
58

cf-smoke-tests

Smoke tests for CloudFoundry that are safe to run in a production environment
Go
38
star
59

credhub-cli

CredHub CLI provides a command line interface to interact with CredHub servers
Go
38
star
60

community

Governance and contact information for Cloud Foundry
Python
37
star
61

bosh-linux-stemcell-builder

BOSH Ubuntu Linux stemcells
Ruby
37
star
62

haproxy-boshrelease

A BOSH release for haproxy (based on cf-release's haproxy job)
Ruby
37
star
63

pmc-notes

Agendas and Notes for Cloud Foundry Project Management Committee Meetings
36
star
64

eirini-release

Helm release for Project Eirini
Shell
36
star
65

bosh-s3cli

Go CLI for S3
Go
36
star
66

bpm-release

isolated bosh jobs
Go
35
star
67

libbuildpack

A library for writing buildpacks
Go
34
star
68

cfdot

A command-line tool to interact with a Cloud Foundry Diego deployment.
Go
34
star
69

bosh-openstack-cpi-release

BOSH OpenStack CPI
Ruby
33
star
70

java-test-applications

Applications used for testing the Java buildpack
Java
33
star
71

switchboard

Golang TCP Proxy
JavaScript
33
star
72

docs-bosh

The docs repo for BOSH
HTML
32
star
73

cf-k8s-networking

building a cloud foundry without gorouter....
Go
32
star
74

cflinuxfs2

The official Cloud Foundry app container rootfs
Ruby
31
star
75

pxc-release

BOSH release of Percona Xtradb Cluster
JavaScript
30
star
76

clock

time provider & rich fake for Go
Go
30
star
77

bosh-vsphere-cpi-release

BOSH vSphere CPI
Ruby
30
star
78

os-conf-release

Additional Linux OS configuration release
Go
30
star
79

binary-buildpack

Deploy binaries to Cloud Foundry
Shell
28
star
80

bbs

Internal API to access the database for Diego.
Go
28
star
81

nginx-buildpack

Cloud Foundry buildpack that provides NGINX
Go
28
star
82

jumpbox-deployment

Deploy single vanilla jumpbox machine with BOSH
Shell
28
star
83

bosh-aws-cpi-release

BOSH AWS CPI
Ruby
27
star
84

uaa-release

Bosh Release for the UAA
Ruby
27
star
85

app-autoscaler-release

Automated scaling for apps running on Cloud Foundry
Go
26
star
86

archiver

Utilities for extracting and compressing tgz and zip files.
Go
26
star
87

bosh-backup-and-restore

Go
26
star
88

exemplar-release

Shell
25
star
89

apt-buildpack

Go
25
star
90

diego-notes

Diego Notes
23
star
91

capi-release

Bosh Release for Cloud Controller and friends
HTML
23
star
92

noaa

NOAA is a client library to consume metric and log messages from Doppler.
Go
23
star
93

metric-store-release

Metric Store: A Cloud-Native Time Series Database for Cloud Foundry
Go
23
star
94

cli-plugin-repo

Public repository for community created CF CLI plugins.
Go
23
star
95

cf-deployment-concourse-tasks

Shell
23
star
96

buildpack-packager

Buildpack Packager
Ruby
23
star
97

uaa-cli

CLI for UAA written in Go
Go
22
star
98

galera-healthcheck

A lightweight web server written in Golang to check the health of a node in a Galera cluster
Go
21
star
99

winc

CLI tool for spawning and running containers on Windows according to the OCI specification
Go
21
star
100

docs-buildpacks

HTML
21
star