• Stars
    star
    191
  • Rank 195,477 (Top 4 %)
  • Language
    Swift
  • License
    Apache License 2.0
  • Created over 3 years ago
  • Updated 11 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Distributed Membership Protocol implementations in Swift

Swift Cluster Membership

This library aims to help Swift make ground in a new space: clustered multi-node distributed systems.

With this library we provide reusable runtime agnostic membership protocol implementations which can be adopted in various clustering use-cases.

Background

Cluster membership protocols are a crucial building block for distributed systems, such as computation intensive clusters, schedulers, databases, key-value stores and more. With the announcement of this package, we aim to make building such systems simpler, as they no longer need to rely on external services to handle service membership for them. We would also like to invite the community to collaborate on and develop additional membership protocols.

At their core, membership protocols need to provide an answer for the question "Who are my (live) peers?". This seemingly simple task turns out to be not so simple at all in a distributed system where delayed or lost messages, network partitions, and unresponsive but still "alive" nodes are the daily bread and butter. Providing a predictable, reliable answer to this question is what cluster membership protocols do.

There are various trade-offs one can take while implementing a membership protocol, and it continues to be an interesting area of research and continued refinement. As such, the cluster-membership package intends to focus not on a single implementation, but serve as a collaboration space for various distributed algorithms in this space.

🏊🏾‍♀️🏊🏻‍♀️🏊🏾‍♂️🏊🏼‍♂️ SWIMming with Swift

High-level Protocol Description

For a more in-depth discussion of the protocol and modifications in this implementation we suggest reading the SWIM API Documentation, as well as the associated papers linked below.

The Scalable Weakly-consistent Infection-style process group Membership algorithm (also known as "SWIM"), along with a few notable protocol extensions as documented in the 2018 Lifeguard: Local Health Awareness for More Accurate Failure Detection paper.

SWIM is a gossip protocol in which peers periodically exchange bits of information about their observations of other nodes’ statuses, eventually spreading the information to all other members in a cluster. This category of distributed algorithms are very resilient against arbitrary message loss, network partitions and similar issues.

At a high level, SWIM works like this:

  • A member periodically pings a "randomly" selected peer it is aware of. It does so by sending a .ping message to that peer, expecting an .ack to be sent back. See how A probes B initially in the diagram below.
    • The exchanged messages also carry a gossip payload, which is (partial) information about what other peers the sender of the message is aware of, along with their membership status (.alive, .suspect, etc.)
  • If it receives an .ack, the peer is considered still .alive. Otherwise, the target peer might have terminated/crashed or is unresponsive for other reasons.
    • In order to double check if the peer really is dead, the origin asks a few other peers about the state of the unresponsive peer by sending .pingRequest messages to a configured number of other peers, which then issue direct pings to that peer (probing peer E in the diagram below).
  • If those pings fail, due to lack of .acks resulting in the peer being marked as .suspect,
    • Our protocol implementation will also use additional .nack ("negative acknowledgement") messages in the situation to inform the ping request origin that the intermediary did receive those .pingRequest messages, however the target seems to not have responded. We use this information to adjust a Local Health Multiplier, which affects how timeouts are calculated. To learn more about this refer to the API docs and the Lifeguard paper.

SWIM: Messages Examples

The above mechanism, serves not only as a failure detection mechanism, but also as a gossip mechanism, which carries information about known members of the cluster. This way members eventually learn about the status of their peers, even without having them all listed upfront. It is worth pointing out however that this membership view is weakly-consistent, which means there is no guarantee (or way to know, without additional information) if all members have the same exact view on the membership at any given point in time. However, it is an excellent building block for higher-level tools and systems to build their stronger guarantees on top.

Once the failure detection mechanism detects an unresponsive node, it eventually is marked as .dead resulting in its irrevocable removal from the cluster. Our implementation offers an optional extension, adding an .unreachable state to the possible states, however most users will not find it necessary and it is disabled by default. For details and rules rules about legal status transitions refer to SWIM.Status or the following diagram:

SWIM: Lifecycle Diagram

The way Swift Cluster Membership implements protocols, is by offering "Instances" of them. For example, the SWIM implementation is encapsulated in the runtime agnostic SWIM.Instance which needs to be “driven” or “interpreted” by some glue code between a networking runtime and the instance itself. We call those glue pieces of an implementation "Shells", and the library ships with a SWIMNIOShell implemented using SwiftNIO’s DatagramChannel that performs all messaging asynchronously over UDP. Alternative implementations can use completely different transports, or piggy back SWIM messages on some other existing gossip system etc.

The SWIM instance also has built-in support for emitting metrics (using swift-metrics) and can be configured to log details about internal details by passing a swift-log Logger.

Example: Reusing the SWIM protocol logic implementation

The primary purpose of this library is to share the SWIM.Instance implementation across various implementations which need some form of in-process membership service. Implementing a custom runtime is documented in depth in the project’s README (https://github.com/apple/swift-cluster-membership/), so please have a look there if you are interested in implementing SWIM over some different transport.

Implementing a new transport boils down a “fill in the blanks” exercise:

First, one has to implement the Peer protocols (https://github.com/apple/swift-cluster-membership/blob/main/Sources/SWIM/Peer.swift) using one’s target transport:

/// SWIM peer which can be initiated contact with, by sending ping or ping request messages.
public protocol SWIMPeer: SWIMAddressablePeer {
    /// Perform a probe of this peer by sending a `ping` message.
    /// 
    /// <... more docs here - please refer to the API docs for the latest version ...>
    func ping(
        payload: SWIM.GossipPayload,
        from origin: SWIMPingOriginPeer,
        timeout: DispatchTimeInterval,
        sequenceNumber: SWIM.SequenceNumber
    ) async throws -> SWIM.PingResponse
    
    // ... 
}

Which usually means wrapping some connection, channel, or other identity with the ability to send messages and invoke the appropriate callbacks when applicable.

Then, on the receiving end of a peer, one has to implement receiving those messages and invoke all the corresponding on<SomeMessage>(...) callbacks defined on the SWIM.Instance (grouped under SWIMProtocol).

A piece of the SWIMProtocol is listed below to give you an idea about it:

public protocol SWIMProtocol {

    /// MUST be invoked periodically, in intervals of `self.swim.dynamicLHMProtocolInterval`.
    ///
    /// MUST NOT be scheduled using a "repeated" task/timer", as the interval is dynamic and may change as the algorithm proceeds.
    /// Implementations should schedule each next tick by handling the returned directive's `scheduleNextTick` case,
    /// which includes the appropriate delay to use for the next protocol tick.
    ///
    /// This is the heart of the protocol, as each tick corresponds to a "protocol period" in which:
    /// - suspect members are checked if they're overdue and should become `.unreachable` or `.dead`,
    /// - decisions are made to `.ping` a random peer for fault detection,
    /// - and some internal house keeping is performed.
    ///
    /// Note: This means that effectively all decisions are made in interval sof protocol periods.
    /// It would be possible to have a secondary periodic or more ad-hoc interval to speed up
    /// some operations, however this is currently not implemented and the protocol follows the fairly
    /// standard mode of simply carrying payloads in periodic ping messages.
    ///
    /// - Returns: `SWIM.Instance.PeriodicPingTickDirective` which must be interpreted by a shell implementation
    mutating func onPeriodicPingTick() -> [SWIM.Instance.PeriodicPingTickDirective]

    mutating func onPing( ... ) -> [SWIM.Instance.PingDirective]

    mutating func onPingRequest( ... ) -> [SWIM.Instance.PingRequestDirective]

    mutating func onPingResponse( ... ) -> [SWIM.Instance.PingResponseDirective]

    // ... 
}

These calls perform all SWIM protocol specific tasks internally, and return directives which are simple to interpret “commands” to an implementation about how it should react to the message. For example, upon receiving a .pingRequest message, the returned directive may instruct a shell to send a ping to some nodes. The directive prepares all apropriate target, timeout and additional information that makes it simpler to simply follow its instruction and implement the call correctly, e.g. like this:

self.swim.onPingRequest(
    target: target,
    pingRequestOrigin: pingRequestOrigin,            
    payload: payload,
    sequenceNumber: sequenceNumber
).forEach { directive in
    switch directive {
    case .gossipProcessed(let gossipDirective):
        self.handleGossipPayloadProcessedDirective(gossipDirective)

    case .sendPing(let target, let payload, let pingRequestOriginPeer, let pingRequestSequenceNumber, let timeout, let sequenceNumber):
        self.sendPing(
            to: target,
            payload: payload,
            pingRequestOrigin: pingRequestOriginPeer,
            pingRequestSequenceNumber: pingRequestSequenceNumber,
            timeout: timeout,
            sequenceNumber: sequenceNumber
        )
    }
}

In general this allows for all the tricky "what to do when" to be encapsulated within the protocol instance, and a Shell only has to follow instructions implementing them. The actual implementations will often need to perform some more involved concurrency and networking thasks, like awaiting for a sequence of responses, and handling them in a specific way etc, however the general outline of the protocol is orchestrated by the instance's directives.

For detailed documentation about each of the callbacks, when to invoke them, and how all this fits together, please refer to the API Documentation.

Example: SWIMming with Swift NIO

The repository contains an end-to-end example and an example implementation called SWIMNIOExample which makes use of the SWIM.Instance to enable a simple UDP based peer monitoring system. This allows peers to gossip and notify each other about node failures using the SWIM protocol by sending datagrams driven by SwiftNIO.

📘 The SWIMNIOExample implementation is offered only as an example, and has not been implemented with production use in mind, however with some amount of effort it could definitely do well for some use-cases. If you are interested in learning more about cluster membership algorithms, scalability benchmarking and using SwiftNIO itself, this is a great module to get your feet wet, and perhaps once the module is mature enough we could consider making it not only an example, but a reusable component for Swift NIO based clustered applications.

In it’s simplest form, combining the provided SWIM instance and NIO shell to build a simple server, one can embedd the provided handlers like shown below, in a typical NIO channel pipeline:

let bootstrap = DatagramBootstrap(group: group)
    .channelOption(ChannelOptions.socketOption(.so_reuseaddr), value: 1)
    .channelInitializer { channel in
        channel.pipeline
            // first install the SWIM handler, which contains the SWIMNIOShell:
            .addHandler(SWIMNIOHandler(settings: settings)).flatMap {
                // then install some user handler, it will receive SWIM events:
                channel.pipeline.addHandler(SWIMNIOExampleHandler())
        }
    }

bootstrap.bind(host: host, port: port)

The example handler can then receive and handle SWIM cluster membership change events:

final class SWIMNIOExampleHandler: ChannelInboundHandler {
    public typealias InboundIn = SWIM.MemberStatusChangedEvent
    
    let log = Logger(label: "SWIMNIOExampleHandler")
    
    public func channelRead(context: ChannelHandlerContext, data: NIOAny) {
        let change: SWIM.MemberStatusChangedEvent = self.unwrapInboundIn(data)

        self.log.info("Membership status changed: [\(change.member.node)] is now [\(change.status)]", metadata: [    
            "swim/member": "\(change.member.node)",
            "swim/member/status": "\(change.status)",
        ])
    }
}

If you are interested in contributing and polishing up the SWIMNIO implementation please head over to the issues and pick up a task or propose an improvement yourself!

Additional Membership Protocol Implementations

We are generally interested in fostering discussions and implementations of additional membership implementations using a similar "Instance" style.

If you are interested in such algorithms, and have a favourite protocol that you'd like to see implemented, please do not hesitate to reach out heve via issues or the Swift forums.

Contributing

Experience reports, feedback, improvement ideas and contributions are greatly encouraged! We look forward to hear from you.

Please refer to CONTRIBUTING guide to learn about the process of submitting pull requests, and refer to the HANDBOOK for terminology and other useful tips for working with this library.

More Repositories

1

swift

The Swift Programming Language
C++
65,899
star
2

ml-stable-diffusion

Stable Diffusion with Core ML on Apple Silicon
Python
16,002
star
3

swift-evolution

This maintains proposals for changes and user-visible enhancements to the Swift Programming Language.
Markdown
15,013
star
4

foundationdb

FoundationDB - the open source, distributed, transactional key-value store
C++
13,947
star
5

turicreate

Turi Create simplifies the development of custom machine learning models.
C++
11,153
star
6

darwin-xnu

The Darwin Kernel (mirror). This repository is a pure mirror and contributions are currently not accepted via pull-requests, please submit your contributions via https://developer.apple.com/bug-reporting/
C
10,558
star
7

swift-package-manager

The Package Manager for the Swift Programming Language
Swift
9,592
star
8

ml-ferret

Python
7,576
star
9

swift-nio

Event-driven network application framework for high performance protocol servers & clients, non-blocking.
Swift
7,274
star
10

swift-algorithms

Commonly used sequence and collection algorithms for Swift
Swift
5,622
star
11

swift-corelibs-foundation

The Foundation Project, providing core utilities, internationalization, and OS independence
Swift
5,189
star
12

swift-protobuf

Plugin and runtime library for using protobuf with Swift
Swift
4,446
star
13

password-manager-resources

A place for creators and users of password managers to collaborate on resources to make password management better.
JavaScript
4,010
star
14

coremltools

Core ML tools contain supporting tools for Core ML model conversion, editing, and validation.
Python
3,974
star
15

ml-mgie

Python
3,682
star
16

tensorflow_macos

TensorFlow for macOS 11.0+ accelerated using Apple's ML Compute framework.
Shell
3,643
star
17

swift-collections

Commonly used data structures for Swift
Swift
3,434
star
18

pkl

A configuration as code language with rich validation and tooling.
Java
3,360
star
19

swift-argument-parser

Straightforward, type-safe argument parsing for Swift
Swift
3,163
star
20

sourcekit-lsp

Language Server Protocol implementation for Swift and C-based languages
Swift
3,110
star
21

swift-log

A Logging API for Swift
Swift
2,931
star
22

swift-syntax

A set of Swift libraries for parsing, inspecting, generating, and transforming Swift source code.
Swift
2,887
star
23

swift-async-algorithms

Async Algorithms for Swift
Swift
2,695
star
24

swift-markdown

A Swift package for parsing, building, editing, and analyzing Markdown documents.
Swift
2,586
star
25

HomeKitADK

C
2,456
star
26

ml-ane-transformers

Reference implementation of the Transformer architecture optimized for Apple Neural Engine (ANE)
Python
2,431
star
27

swift-corelibs-libdispatch

The libdispatch Project, (a.k.a. Grand Central Dispatch), for concurrency on multicore hardware
C
2,420
star
28

swift-format

Formatting technology for Swift source code
Swift
2,261
star
29

homebrew-apple

Ruby
2,227
star
30

swift-foundation

The Foundation project
Swift
2,088
star
31

cups

Apple CUPS Sources
C
1,828
star
32

sample-food-truck

SwiftUI sample code from WWDC22
Swift
1,695
star
33

ml-fastvit

This repository contains the official implementation of the research paper, "FastViT: A Fast Hybrid Vision Transformer using Structural Reparameterization" ICCV 2023
Python
1,693
star
34

ml-cvnets

CVNets: A library for training computer vision networks
Python
1,664
star
35

swift-book

The Swift Programming Language book
Markdown
1,616
star
36

swift-numerics

Advanced mathematical types and functions for Swift
Swift
1,602
star
37

ml-hypersim

Hypersim: A Photorealistic Synthetic Dataset for Holistic Indoor Scene Understanding
Python
1,495
star
38

swift-crypto

Open-source implementation of a substantial portion of the API of Apple CryptoKit suitable for use on Linux platforms.
C
1,400
star
39

swift-docker

Docker Official Image packaging for Swift
Dockerfile
1,331
star
40

ml-neuman

Official repository of NeuMan: Neural Human Radiance Field from a Single Video (ECCV 2022)
Python
1,233
star
41

swift-openapi-generator

Generate Swift client and server code from an OpenAPI document.
Swift
1,142
star
42

swift-system

Low-level system calls and types for Swift
Swift
1,137
star
43

swift-corelibs-xctest

The XCTest Project, A Swift core library for providing unit test support
Swift
1,120
star
44

swift-docc

Documentation compiler that produces rich API reference documentation and interactive tutorials for your Swift framework or package.
Swift
1,093
star
45

swift-llbuild

A low-level build system, used by Xcode and the Swift Package Manager
C++
1,067
star
46

swift-atomics

Low-level atomic operations for Swift
Swift
1,004
star
47

swift-testing

Swift
981
star
48

servicetalk

A networking framework that evolves with your application
Java
881
star
49

swift-http-types

Version-independent HTTP currency types for Swift
Swift
815
star
50

swift-llvm

LLVM
815
star
51

swift-driver

Swift compiler driver reimplementation in Swift
Swift
764
star
52

swift-protobuf-plugin

Moved to apple/swift-protobuf
757
star
53

swift-lldb

This is the version of LLDB that supports the Swift programming language & REPL.
C++
673
star
54

swift-clang

C++
673
star
55

unityplugins

C#
645
star
56

ml-mobileone

This repository contains the official implementation of the research paper, "An Improved One millisecond Mobile Backbone".
Swift
641
star
57

ml-gaudi

602
star
58

ml-aim

This repository provides the code and model checkpoints of the research paper: Scalable Pre-training of Large Autoregressive Image Models
Python
602
star
59

swift-metrics

Metrics API for Swift
Swift
602
star
60

axlearn

Python
564
star
61

swift-distributed-actors

Peer-to-peer cluster implementation for Swift Distributed Actors
Swift
562
star
62

ARKitScenes

This repo accompanies the research paper, ARKitScenes - A Diverse Real-World Dataset for 3D Indoor Scene Understanding Using Mobile RGB-D Data and contains the data, scripts to visualize and process assets, and training code described in our paper.
Python
552
star
63

sample-backyard-birds

Swift
506
star
64

device-management

Device management schema data for MDM.
506
star
65

ccs-calendarserver

The Calendar and Contacts Server.
Python
470
star
66

ml-facelit

Official repository of FaceLit: Neural 3D Relightable Faces (CVPR 2023)
Python
457
star
67

swift-3-api-guidelines-review

Swift
452
star
68

swift-org-website

Swift.org website
SCSS
438
star
69

GCGC

Jupyter Notebook
436
star
70

swift-nio-http2

HTTP/2 support for SwiftNIO
Swift
405
star
71

swift-tools-support-core

Contains common infrastructural code for both SwiftPM and llbuild.
Swift
390
star
72

swift-nio-ssh

SwiftNIO SSH is a programmatic implementation of SSH using SwiftNIO
Swift
364
star
73

swift-nio-ssl

TLS Support for SwiftNIO, based on BoringSSL.
C
345
star
74

ml-gmpi

Official PyTorch implementation of GMPI (ECCV 2022, Oral Presentation)
Python
329
star
75

example-package-dealer

Example package for use with the Swift Package Manager
Swift
319
star
76

swift-collections-benchmark

A benchmarking tool for Swift Collection algorithms
Swift
316
star
77

example-package-playingcard

Example package for use with the Swift Package Manager
Swift
308
star
78

swift-docc-render

Web renderer for Swift-DocC documentation.
JavaScript
300
star
79

indexstore-db

Index database library for use with sourcekit-lsp
C++
299
star
80

swift-playdate-examples

A technical demonstration of Embedded Swift running on Playdate by Panic
Swift
295
star
81

swift-docc-plugin

Swift Package Manager command plugin for Swift-DocC
Swift
295
star
82

ml-hierarchical-confusion-matrix

Neo: Hierarchical Confusion Matrix Visualization (CHI 2022)
TypeScript
292
star
83

ml-gsn

Python
284
star
84

swift-llbuild2

A fresh take on a low-level build system API.
Swift
280
star
85

swift-source-compat-suite

The infrastructure and project index comprising the Swift source compatibility suite.
Python
278
star
86

sample-cloudkit-sharing

Swift
275
star
87

swift-xcode-playground-support

Logging and communication to allow Swift toolchains to communicate with Xcode.
Swift
270
star
88

swift-experimental-string-processing

An early experimental general-purpose pattern matching engine for Swift.
Swift
263
star
89

ml-sigma-reparam

Python
255
star
90

swift-standard-library-preview

Swift
253
star
91

swift-nio-transport-services

Extensions for SwiftNIO to support Apple platforms as first-class citizens.
Swift
252
star
92

swift-stress-tester

Stress testing utilities for Swift's tooling
Swift
207
star
93

swift-service-discovery

A service discovery API for Swift.
Swift
203
star
94

swift-certificates

An implementation of X.509 for Swift
Swift
195
star
95

swift-nio-examples

examples of how to use swift-nio
Swift
195
star
96

swift-aoc-starter-example

Swift starter project for solving Advent of Code challenges.
Swift
189
star
97

sample-cloudkit-coredatasync

Swift
187
star
98

swift-distributed-tracing

Instrumentation library for Swift server applications
Swift
186
star
99

pfl-research

Simulation framework for accelerating research in Private Federated Learning
Python
186
star
100

swift-internals

HTML
182
star