• Stars
    star
    251
  • Rank 161,862 (Top 4 %)
  • Language
    Rust
  • License
    Mozilla Public Li...
  • Created almost 5 years ago
  • Updated 16 days ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Omicron: Oxide control plane

Oxide Control Plane

This repo houses the work-in-progress Oxide Rack control plane.

badge

Omicron is open-source. But we’re pretty focused on our own goals for the foreseeable future and not able to help external contributors. Please see CONTRIBUTING.md for more information.

Documentation

Docs are automatically generated for the public (externally-facing) API based on the OpenAPI spec that itself is automatically generated from the server implementation. You can generate your own docs for either the public API or any of the internal APIs by feeding the corresponding OpenAPI specs (in ./openapi) into an OpenAPI doc generator.

There are some internal design docs in the ./docs directory.

For more design documentation and internal Rust API docs, see the generated Rust documentation. You can generate this yourself with:

$ cargo doc --document-private-items

Note that --document-private-items is configured by default, so you can actually just use cargo doc.

Folks with access to Oxide RFDs may find RFD 48 ("Control Plane Requirements") and other control plane RFDs relevant. These are not currently publicly available.

Build and run

Omicron has two modes of operation: "simulated" and "non-simulated".

The simulated version of Omicron allows the high-level control plane logic to run without actually managing any sled-local resources. This version can be executed on Linux, Mac, and illumos. This mode of operation is provided for development and testing only.

To build and run the simulated version of Omicron, see: docs/how-to-run-simulated.adoc.

The non-simulated version of Omicron actually manages sled-local resources, and may only be executed on hosts running Helios. This mode of operation will be used in production.

To build and run the non-simulated version of Omicron, see: docs/how-to-run.adoc.

Run tests with nextest

The supported way to run tests is via cargo-nextest.

Note
cargo test may work, but that can’t be guaranteed as cargo test isn’t run in CI.

If you don’t already have nextest installed, get started by downloading a pre-built binary or installing nextest via your package manager. Nextest has pre-built binaries for Linux, macOS and illumos.

Then, run tests with:

$ cargo nextest run

Nextest does not support doctests. Run doctests separately with cargo test --doc.

rustfmt and clippy

You can format the code using cargo fmt. Make sure to run this before pushing changes. The CI checks that the code is correctly formatted.

You can run the Clippy linter using cargo xtask clippy. CI checks that code is clippy-clean.

Working in Omicron

Omicron is a pretty large repo containing a bunch of related components. (Why? See docs/repo.adoc.) If you just build the whole thing with cargo build or cargo nextest run, it can take a while, even for incremental builds. Since most people are only working on a few of these components at a time, it’s helpful to be know about Cargo’s tools for working with individual packages in a workspace.

Note
This section assumes you’re already familiar with the prerequisites and environment setup needed to do any work on Omicron. See docs/how-to-run-simulated.adoc or docs/how-to-run.adoc for more on that.

Key tips

  • Use cargo check when you just want to know if your code compiles. It’s much faster than cargo build or cargo nextest run.

  • When using Cargo’s check/build/test/clippy commands, you can use the -p PACKAGE flag to only operate on a specific package. This often saves a lot of time for incremental builds.

  • When using Cargo’s check/build/clippy commands, use --all-targets to make sure you’re checking or building the test code, too.

These are explained a bit more below, along with some common pitfalls.

Here’s an example workflow. Suppose you’re working on some changes to the Nexus database model (nexus-db-model package, located at nexus/db-model from the root). While you’re actively writing and checking code, you might run:

cargo check --all-targets

without any -p flag. Running this incrementally is pretty fast even on the whole workspace. This also uncovers places where your changes have broken code that uses this package. (If you’re making big changes, you might not want that right away. In that case, you might choose to use -p nexus-db-model here.)

When you’re ready to test the changes you’ve made, start with building and running tests for the most specific package you’ve changed:

cargo nextest run -p nexus-db-model

Once that works, check the tests for the next package up:

cargo nextest run -p omicron-nexus

When you’re happy with things and want to make sure you haven’t missed something, test everything:

cargo nextest run

Rust packages in Omicron

Note
The term "package" is overloaded: most programming languages and operating systems have their own definitions of a package. On top of that, Omicron bundles up components into our own kind of "package" that gets delivered via the install and update systems. These are described in the package-manifest.toml file in the root of the repo. In this section, we’re just concerned with Rust packages.
Note
There’s also confusion in the Rust world about the terms "packages" and "crates". Packages are the things that have a Cargo.toml file. (Workspaces like Omicron itself have Cargo.toml files, too.) Packages are also the things that you publish to crates.io (confusingly). One package might have a library, a standalone executable binary, several examples, integration tests, etc. that are all compiled individually and produce separate artifacts. These are what Rust calls crates. We’re generally just concerned with packages here, not crates.

Here are some of the big components in the control plane that live in this repo:

Main rust package Component Description

omicron-nexus

Nexus

Service responsible for handling external API requests and orchestrating the rest of the control plane.

omicron-sled-agent

Sled Agent

Service that runs on each compute sled (server) to manage resources on that Sled

dns-server

Internal DNS server, External DNS server

DNS server component used for both internal service discovery and external DNS

omicron-gateway

Management Gateway Service

Connects Nexus (and other control plane services) to services on the rack management network (e.g., service processors)

oximeter/oximeter

Oximeter

Collects telemetry from other services and stores it into Clickhouse

wicket/wicketd

Wicket

CLI interface made available to operators on the rack technician port for rack setup and recovery

For those with access to Oxide RFDs, RFD 61 discusses the organization principles and key components in more detail.

Many of these components themselves are made up of other packages (e.g., nexus-db-model is under omicron-nexus). There are also many more top-level packages than what’s mentioned above. These are used for common code, clients, tools, etc. For more, see the Rustdoc for each module. (Where docs are missing or incomplete, please contribute!)

Use Cargo’s -p PACKAGE to check/build/test only the package you’re working on. Since people are usually only working on one or two components at a time, you can usually iterate faster this way.

Why is Cargo rebuilding stuff all the time?

People are often surprised to find Cargo rebuilding stuff that it seems like it’s just built, even when the relevant source files haven’t changed.

  • Say you’re iterating on code, running cargo build -p nexus-db-model to build just that package. Great, it works. Let’s run tests: cargo nextest run -p nexus-db-model. Now it’s rebuilding some dependency of nexus-db-model again?!

  • Say you’ve just run cargo nextest run -p nexus-db-model. Now you go run cargo nextest run -p omicron-nexus, which uses nexus-db-model. You see Cargo building nexus-db-model again?!

This usually has to do with the way Cargo selects package features. These are essentially tags that are used at build time to include specific code or dependencies. For example, the serde crate defines a feature called "derive" that controls whether the Serialize/Deserialize derive macros will be included. Let’s look at how this affects builds.

Tip
You can use cargo tree to inspect a package’s dependencies, including features. This is useful for debugging feature-related build issues.

Feature selection differs when building tests

When you run cargo build -p nexus-db-model, Cargo looks at all the packages in the depencency tree of nexus-db-model and figures out what features it needs for each one. Let’s take the uuid package as an example. Cargo takes union of the features required by any of the packages that depend on uuid in the whole dependency tree of nexus-db-model. Let’s say that’s just the "v4" feature. Simple enough.

When you then run cargo nextest run -p nexus-db-model, it does the same thing. Only this time, it’s looking at the dev-dependencies tree. nexus-db-model 's dev-dependencies might include some other package that depends on uuid and requires the "v5" feature. Now, Cargo has to rebuild uuid — and anything else that depends on it.

This is why when using Cargo’s check/build/clippy commands, we suggest using --all-targets. When you use cargo build --all-targets, it builds the tests as well. It’s usually not much more time and avoids extra rebuilds when switching back and forth between the default targets and the targets with tests included.

Feature selection differs when building different packages

People run into a similar problem when switching packages within Omicron. Once you’ve got cargo nextest run -p nexus-db-model working, you may run cargo nextest run -p omicron-nexus, which uses nexus-db-model. And you may be surprised to see Cargo rebuilding some common dependency like uuid. It’s the same as above: we’re building a different package now. It has a different (larger) dependency tree. That may result in some crate deep in the dependency tree needing some new feature, causing it and all of its dependents to be rebuilt.

Note
There is interest in changing the way feature selection works in workspaces like Omicron for exactly this reason. It’s been suggested to have an option for Cargo to always look at the features required for all packages in the workspace, rather than just the one you’ve selected. This could eliminate this particular problem. In the meantime, we mitigate this with heavy use of workspace dependencies, which helps make sure that different packages within Omicron depend on the same set of features for a given dependency.

Why am I getting compile errors after I thought I’d already built everything?

Say you’re iterating on code, running cargo build -p nexus-db-model to build just that package. You work through lots of compiler errors until finally it works. Now you run tests: cargo nextest run -p nexus-db-model. Now you see a bunch of compiler errors again! What gives?

By default, Cargo does not operate on the tests. Cargo’s check/build/clippy commands ignore them. This is another reason we suggest using --all-targets most of the time.

Generated Service Clients and Updating

Each service is a Dropshot server that presents an HTTP API. The description of that API is serialized as an OpenAPI document which we store in omicron/openapi and check in to this repo. In order to ensure that changes to those APIs are made intentionally, each service contains a test that validates that the current API matches. This allows us 1. to catch accidental changes as test failures and 2. to explicitly observe API changes during code review (and in the git history).

We also use these OpenAPI documents as the source for the clients we generate using Progenitor. Clients are automatically updated when the coresponding OpenAPI document is modified.

Note that Omicron contains a nominally circular dependency:

  • Nexus depends on the Sled Agent client

  • The Sled Agent client is derived from the OpenAPI document emitted by Sled Agent

  • Sled Agent depends on the Nexus client

  • The Nexus client is derived from the OpenAPI document emitted by Nexus

We effectively "break" this circular dependency by virtue of the OpenAPI documents being checked in.

In general, changes any service API require the following set of build steps:

  1. Make changes to the service API.

  2. Update the OpenAPI document by running the relevant test with overwrite set: EXPECTORATE=overwrite cargo nextest run -p <package> — test_nexus_openapi_internal (changing the package name and test name as necessary). It’s important to do this before the next step.

  3. This will cause the generated client to be updated which may break the build for dependent consumers.

  4. Modify any dependent services to fix calls to the generated client.

Note that if you make changes to both Nexus and Sled Agent simultaneously, you may end up in a spot where neither can build and therefore neither OpenAPI document can be generated. In this case, revert or comment out changes in one so that the OpenAPI document can be generated.

This is a particular problem if you find yourself resolving merge conflicts in the generated files. You have basically two options for this:

  • Resolve the merge conflicts by hand. This is usually not too bad in practice.

  • Take the upstream copy of the file, back out your client side changes (git stash and its -p option can be helpful for this), follow the steps above to regenerate the file using the automated test, and finally re-apply your changes to the client side. This is essentially getting yourself back to step 1 above and then following the procedure above.

Resolving merge conflicts in Cargo.lock

When pulling in new changes from upstream "main", you may find conflicts in Cargo.lock. The easiest way to deal with these is usually to take the upstream changes as-is, then trigger any Cargo operation that updates the lockfile. cargo metadata is a quick one. Here’s an example:

# Pull in changes from upstream "main"
$ git fetch
$ git merge origin/main

# Oh no!  We've got conflicts in Cargo.lock.  First, let's just take what's upstream:
$ git show origin/main:Cargo.lock > Cargo.lock

# Now, run any command that causes Cargo to update the lock file as needed.
$ cargo metadata > /dev/null

When you do this, Cargo makes only changes to Cargo.lock that are necessary based on the various Cargo.toml files in the workspace and dependencies.

Here are things you don’t want to do to resolve this conflict:

  • Run cargo generate-lockfile to generate a new lock file from scratch.

  • Remove Cargo.lock and let Cargo regenerate it from scratch.

Both of these will cause Cargo to make many more changes (relative to "main") than necessary because it’s choosing the latest version of all dependencies in the whole tree. You’ll be inadvertently updating all of Omicron’s transitive dependencies. (You might conceivably want that. But usually we update dependencies either as-needed for a particular change or via individual PRs via dependabot, not all at once because someone had to merge Cargo.lock.)

You can also resolve conflicts by hand. It’s tedious and error-prone.

Configuring ClickHouse

The ClickHouse binary uses several sources for its configuration. The binary expects an XML config file, usually named config.xml to be available, or one may be specified with the -C command-line flag. The binary also includes a minimal configuration embedded within it, which will be used if no configuration file is given or present in the current directory. The server also accepts command-line flags for overriding the values of the configuration parameters.

The packages downloaded by ci_download_clickhouse include a config.xml file with them. You should probably run ClickHouse via the omicron-dev tool, but if you decide to run it manually, you can start the server with:

$ /path/to/clickhouse server --config-file /path/to/config.xml

The configuration file contains a large number of parameters, but most of them are described with comments in the included config.xml, or you may learn more about them here and here. Parameters may be updated in the config.xml, and the server will automatically reload them. You may also specify many of them on the command-line with:

$ /path/to/clickhouse server --config-file /path/to/config.xml -- --param_name param_value ...

More Repositories

1

hubris

A lightweight, memory-protected, message-passing kernel for deeply embedded systems.
Rust
2,912
star
2

dropshot

expose REST APIs from a Rust program
Rust
855
star
3

progenitor

An OpenAPI client generator
Rust
522
star
4

humility

Debugger for Hubris
Rust
452
star
5

typify

compiler from JSON Schema into idiomatic Rust types
Rust
425
star
6

helios

Helios: Or, a Vision in a Dream. A Fragment.
Rust
367
star
7

oxide-and-friends

Show notes from Oxide and Friends recordings
Rust
308
star
8

cio

Rust libraries for APIs needed by our automated CIO.
Rust
247
star
9

propolis

VMM userspace for illumos bhyve
Rust
175
star
10

crucible

A storage service.
Rust
170
star
11

console

Oxide Web Console
TypeScript
133
star
12

p4

A P4 compiler
Rust
105
star
13

design-site

We are looking for designers who code to help build a new user experience for computing!
CSS
99
star
14

cobalt

A collection of common Bluespec interfaces/modules.
Bluespec
96
star
15

third-party-api-clients

A place for keeping all our generated third party API clients.
Rust
89
star
16

usdt

Dust your Rust with USDT probes
Rust
84
star
17

steno

distributed sagas
Rust
75
star
18

phbl

Pico Host Boot Loader
Rust
69
star
19

tockilator

Deducing Tock execution flows from Ibex Verilator traces
Rust
68
star
20

serde_tokenstream

serde::Deserializer for proc_macro/proc_macro2::TokenStream
Rust
63
star
21

buildomat

a software build labour-saving device
Rust
55
star
22

oxide.rs

The Oxide Rust SDK and CLI
Rust
38
star
23

opte

packets go in, packets go out, you can't explain that
Rust
36
star
24

fable

TOML deck generator with custom templates and schema validation
CSS
34
star
25

cancel-safe-futures

Alternative future adapters that provide cancel safety.
Rust
33
star
26

helios-engvm

Tools for creating and using Helios images on i86pc (classic PC) physical and virtual machines
Rust
30
star
27

pki-playground

Tool for generating non-trivial X.509 certificate chains
Rust
28
star
28

expectorate

I'm especially good at expectorating
Rust
27
star
29

maghemite

A routing stack written in Rust.
Rust
26
star
30

oxide.ts

TypeScript client for the Oxide API
TypeScript
24
star
31

aws-wire-lengths

simple command line for various AWS management tasks
Rust
23
star
32

fs3-rs

Extended utilities for working with files and filesystems in Rust.
Rust
22
star
33

lpc-link2-re

Reverse engineering the LPC-Link2 USB interface
Rust
22
star
34

lethe

A basic log-structured flash datastore
Rust
20
star
35

terraform-provider-oxide

Oxide Terraform provider
Go
18
star
36

poptrie

A poptrie implementation
Rust
17
star
37

oxide.go

The Go SDK for Oxide.
Go
17
star
38

cli-old

The command line tool for Oxide.
Rust
16
star
39

design-system

Home of reusable design assets and tokens for oxide internal sites
TypeScript
15
star
40

rfsx

A XMODEM sender using libftdi
Rust
15
star
41

amd-apcb

AMD Generic Encapsulated Software Architecture Platform Security Processor Configuration Block manipulation library
Rust
14
star
42

rustfmt-wrapper

A simple wrapper around rustfmt to use it as a library for use when generating code
Rust
14
star
43

rfb

Rust
13
star
44

async-bb8-diesel

Safe asynchronous access to Diesel and the bb8 connection manager
Rust
12
star
45

xfr

An atomic ring for processing memory-mapped frames.
Rust
11
star
46

idolatry

An experimental IPC interface definition language for Hubris.
Rust
11
star
47

bootleby

Rust
11
star
48

openapi-lint

Validate an OpenAPI schema against some rules
Rust
11
star
49

dropkick

punt your dropshot service into the cloud
Rust
10
star
50

zone

Rust
10
star
51

hif

HIF: The Hubris/Humility Interchange Format
Rust
10
star
52

bhyve-api

Rust library interface to Bhyve ioctl API
Rust
9
star
53

overwatch

A P4-powered packet tracer.
Rust
9
star
54

offline-keystore

yubihsm-setup replacing the yubico cruft with our own cruft!
Rust
9
star
55

react-asciidoc

A React renderer for AsciiDoc. Built on top of Asciidoctor.js.
JavaScript
9
star
56

eos

A build tool for illumos.
Rust
8
star
57

ch-oxidase

A port of Cloud Hypervisor to run on Illumos and the Bhyve kernel space
Rust
8
star
58

softnpu

Software Network Processing Unit
Rust
8
star
59

sprockets

Now's the time on sprockets when we dance
Rust
7
star
60

wfm-to-pcap

.wfm to .pcap decoder
Rust
7
star
61

transceiver-control

Crate for controlling optical transceivers over the network
Rust
7
star
62

lpc55_support

Support tooling for flashing the LPC55
Rust
7
star
63

reqwest-conditional-middleware

A middleware wrapper that enables (or disables) a wrapped Reqwest middleware on a per-request basis
Rust
7
star
64

lpc55s69_rompatch_sample

A sample showing issues with the LPC55 mystery rom patch
C
6
star
65

diesel-dtrace

A diesel connection with DTrace probes for connections and queries
Rust
6
star
66

miniz

toy in-memory implementation of Zanzibar data model
Rust
6
star
67

dice-util

utilities for cert template generation and manufacturing / certifying DeviceIds
Rust
6
star
68

sshauth

A library for SSH key based (agents or static files) authentication tokens
Rust
6
star
69

garbage-compactor

terrible build scripts
Shell
6
star
70

slog-dtrace

A slog drain for emitting logging messages to DTrace
Rust
5
star
71

falcon

Fast Assessment Laboratory for Computers On Networks
Rust
5
star
72

smf

Rust
5
star
73

p9fs

A Plan 9 file system crate
Rust
5
star
74

ispf

An Internet packet format Serde implementation
Rust
5
star
75

nixie-tubes

Oxide's collection of assorted NixOS thingamajigs
Nix
5
star
76

tlvc

TLV-C encoding support.
Rust
5
star
77

thouart

Some helpful code for implementing CLI tools for connecting to simulated remote terminals
Rust
5
star
78

clock

wall clock software for the Oxide office
Rust
5
star
79

qemu-systick-bug

Program demonstrating bug in QEMU's SysTick emulation
Rust
4
star
80

tsc-simulator

Rust tool to calculate and simulate the TSC and other time-related values for live migration
Rust
4
star
81

pmbus

A no_std crate for PMBus manipulation
Rust
4
star
82

qorb

Rust
4
star
83

identicon

TypeScript
4
star
84

helios-omicron-brand

A zone brand for Omicron components running under Helios
Rust
4
star
85

cockroach

CockroachDB 22.1.x long-term maintenance branch
Go
3
star
86

serde_human_bytes

Serialize [u8; N] as bytes or as human-readable strings, depending on the format.
Rust
3
star
87

oxide.rs-old

The Rust API client for Oxide.
Rust
3
star
88

sb2_poc

Proof of concept for SB2 exploits
C
3
star
89

dhcpv6

dhpcv6 encoding/decoding
Rust
3
star
90

netadm-sys

A network administration library and CLI for illumos
Rust
3
star
91

tofino

tofino support stuff
Rust
3
star
92

omicron-package

Tools to create Omicron-branded Zones
Rust
3
star
93

partial-struct

Rust
3
star
94

slog-error-chain

Logging Rust errors with context
Rust
3
star
95

renovate-config

Oxide's shared renovate configuration
3
star
96

ordered-toml

toml-rs except preserves table ordering
Rust
3
star
97

management-gateway-service

Crates shared between MGS in omicron and its agent task in hubris
Rust
3
star
98

kstat-rs

Rust interface to illumos libkstat
Rust
3
star
99

tree-sitter-p4

P4 grammar for tree-sitter
JavaScript
3
star
100

helios-omnios-build

Shell
3
star