Distributed Systems Engineering notes (6.824, Spring 2015)

Lectures

Lecture notes from 6.824, taught by Prof. Robert T. Morris. These lecture notes are slightly modified from the ones posted on the 6.824 course website.

Lecture 1: Introduction: distributed system definition, motivations, architecture, implementation, performance, fault-tolerance, consistency, MapReduce
Lecture 2: Remote Procedure Calls (RPCs): RPC overview, marshalling, binding, threads, "at-least-once", "at-most-once", "exactly once", Go's RPC, thread synchronization
Lecture 3: Fault tolerance: primary-backup replication, state transfer, "split-brain", Remus (NSDI 2008),
Lecture 4: Flat datacenter storage: flat datacenter storage, bisection bandwidth, striping
Lecture 5: Paxos: Paxos, consensus algorithms
- Paxos algorithm description
Lecture 6: Raft: Raft, a more understandable consensus algorithm
Lecture 7: Google Go guest lecture by Russ Cox
Lecture 8: Harp: distributed file system, "the UPS trick", witnesses
Lecture 9: IVY: distributed shared memory, sequential consistency
Lecture 10: TreadMarks: userspace distributed shared memory system, vector timestamps, release consistency (lazy/eager), false sharing, write amplification
Lecture 11: Ficus: optimistic concurrency control, vector timestamps, conflict resolution
Lecture 12: Bayou: disconnected operation, eventual consistency, Bayou
Lecture 13: MapReduce: MapReduce, scalability, performance
Lecture 14: Spark guest lecture by Matei Zaharia: Resilient Distributed Datasets, Spark
Lecture 15: Spanner guest lecture by Wilson Hsieh, Google: Spanner, distributed database, clock skew
Lecture 16: Memcache at Facebook: web app scalability, look-aside caches, Memcache
Lecture 17: PNUTS Yahoo!: distributed key-value store, atomic writes
Lecture 18: Dynamo: distributed key-value store, eventual consistency
Lecture 19: HubSpot guest lecture
Lecture 20: Two phase commit (2PC): two-phase commit, Argus
Lecture 21: Optimistic concurrency control
Lecture 22: Peer-to-peer, trackerless Bittorrent and DHTs: Chord, routing
Lecture 23: Bitcoin: verifiable public ledgers, proof-of-work, double spending

Lectures form other years

Practical Byzantine Fault Tolerance (PBFT)
- Other years: [2012], [2011], [2010], [2009], [2001], [PPT]

Labs

Lab 1: MapReduce, [assign]
Lab 2: A fault-tolerant key/value service, [assign], [notes]
Lab 3: Paxos-based Key/Value Service, [assign], [notes]
Lab 4: Sharded Key/Value Service, [assign], [notes]
Lab 5: Persistent Key/Value Service, [assign]

Papers

Papers we read in 6.824 (directory here):

Other papers:

Impossibility of Distributed Consensus with One Faulty Process
- See page 5, slide 10 here to understand Lemma 1 (commutativity) faster
- See this article here for an alternative explanation.
Practical Byzantine Fault Tolerance (PBFT)
- See discussion here on PBFT.

Stumbled upon

A brief history of consensus, 2PC and transaction commit
Distributed systems theory for the distributed systems engineer
Distributed Systems: For fun and Profit
You can't choose CA out of CAP, or "You can't sacrifice partition tolerance"
Notes on distributed systems for young bloods
Paxos Explained From Scratch

Quizzes

Prep for quiz 1 here

alinush/6.824-lecture-notes

alinush

Reviews

Repository Details