Distributed Systems Engineering notes (6.824, Spring 2015)
Lectures
Lecture notes from 6.824, taught by Prof. Robert T. Morris. These lecture notes are slightly modified from the ones posted on the 6.824 course website.
- Lecture 1: Introduction: distributed system definition, motivations, architecture, implementation, performance, fault-tolerance, consistency, MapReduce
- Lecture 2: Remote Procedure Calls (RPCs): RPC overview, marshalling, binding, threads, "at-least-once", "at-most-once", "exactly once", Go's RPC, thread synchronization
- Lecture 3: Fault tolerance: primary-backup replication, state transfer, "split-brain", Remus (NSDI 2008),
- Lecture 4: Flat datacenter storage: flat datacenter storage, bisection bandwidth, striping
- Lecture 5: Paxos: Paxos, consensus algorithms
- Lecture 6: Raft: Raft, a more understandable consensus algorithm
- Lecture 7: Google Go guest lecture by Russ Cox
- Lecture 8: Harp: distributed file system, "the UPS trick", witnesses
- Lecture 9: IVY: distributed shared memory, sequential consistency
- Lecture 10: TreadMarks: userspace distributed shared memory system, vector timestamps, release consistency (lazy/eager), false sharing, write amplification
- Lecture 11: Ficus: optimistic concurrency control, vector timestamps, conflict resolution
- Lecture 12: Bayou: disconnected operation, eventual consistency, Bayou
- Lecture 13: MapReduce: MapReduce, scalability, performance
- Lecture 14: Spark guest lecture by Matei Zaharia: Resilient Distributed Datasets, Spark
- Lecture 15: Spanner guest lecture by Wilson Hsieh, Google: Spanner, distributed database, clock skew
- Lecture 16: Memcache at Facebook: web app scalability, look-aside caches, Memcache
- Lecture 17: PNUTS Yahoo!: distributed key-value store, atomic writes
- Lecture 18: Dynamo: distributed key-value store, eventual consistency
- Lecture 19: HubSpot guest lecture
- Lecture 20: Two phase commit (2PC): two-phase commit, Argus
- Lecture 21: Optimistic concurrency control
- Lecture 22: Peer-to-peer, trackerless Bittorrent and DHTs: Chord, routing
- Lecture 23: Bitcoin: verifiable public ledgers, proof-of-work, double spending
Lectures form other years
Labs
- Lab 1: MapReduce, [assign]
- Lab 2: A fault-tolerant key/value service, [assign], [notes]
- Lab 3: Paxos-based Key/Value Service, [assign], [notes]
- Lab 4: Sharded Key/Value Service, [assign], [notes]
- Lab 5: Persistent Key/Value Service, [assign]
Papers
Papers we read in 6.824 (directory here):
- MapReduce
- Remus
- Flat datacenter storage
- Paxos
- Raft
- Harp
- Shared virtual memory
- TreadMarks
- Ficus
- Bayou
- Spark
- Spanner
- Memcached at Facebook
- PNUTS
- Dynamo
- Akamai
- Argus, Guardians and actions
- Kademlia
- Bitcoin
- AnalogicFS
Other papers:
- Impossibility of Distributed Consensus with One Faulty Process
- See page 5, slide 10 here to understand Lemma 1 (commutativity) faster
- See this article here for an alternative explanation.
- Practical Byzantine Fault Tolerance (PBFT)
Stumbled upon
- A brief history of consensus, 2PC and transaction commit
- Distributed systems theory for the distributed systems engineer
- Distributed Systems: For fun and Profit
- You can't choose CA out of CAP, or "You can't sacrifice partition tolerance"
- Notes on distributed systems for young bloods
- Paxos Explained From Scratch
Quizzes
Prep for quiz 1 here