• Stars
    star
    416
  • Rank 104,068 (Top 3 %)
  • Language
  • License
    MIT License
  • Created about 1 year ago
  • Updated 8 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

An Opinionated Roadmap to Become an SRE (Concepts > Tools)

SRE Roadmap

An opinionated roadmap to become an SRE (Concepts > Tools)

Distributed systems

  • Concepts
    • Fallacies of distributed computing
    • Synchronous vs. asynchronous
    • Event log vs. message queue
    • Exactly-once delivery
    • Different types of message failure
    • Orchestration vs. choreography
    • Causality
    • CDN
    • Hashing
      • Consistent hashing
      • Geohashing
      • Perfect hashing
    • Read-heavy vs. write-heavy impacts
    • Federation
    • Latency
      • Latency, throughput, goodput
      • Latency numbers every programmer should know
      • How to prevent latency variability
      • Tail latency
    • How to reduce sharing
    • Idempotency
    • Load balancer
      • Concepts
      • Layer 4 vs. layer 7 load balancer
    • Liveness vs. safety properties
    • Microservices: pros and cons
    • REST
    • gRPC
    • Service mesh
    • Source of truth
    • Stateful vs. stateless
    • Total vs. partial order
    • Why can't we rely on the system clock in distributed systems
    • Vector clock
  • Cache
    • When to use a cache
    • Cache-aside vs. read-through
    • Eviction policy
    • Refresh-ahead
    • Write-through vs. write-back
    • Distributed cache
    • Performance cache vs. capacity cache
  • Databases
    • Different types of databases
      • NoSQL vs. SQL databases
      • Relational vs. document
      • Column-oriented databases
      • Graph databases
      • Vector database
      • Objects-based storage
    • ACID
    • Partitioning
      • Criteria
      • Methods
      • Replication vs. partition
    • Hotspot
    • CALM theorem
    • CAP theorem
    • PACELC theorem
    • Cardinality
    • Chain replication
    • Consensus
    • Concurrency control
    • Consistency models
    • Isolation levels
    • Serializability
    • Linearizability
    • CRDT
    • Indexes
      • Tradeoff
      • Primary vs. secondary indexes
    • Denormalization
    • View & materialized view
    • Transaction
    • Distributed transactions downsides
    • Strategies to handle rebalancing
    • Leader election
    • MVCC
    • N+1 select problem
    • Quorum
    • Raft
    • Read repair
    • Single-leader, multi-leader, leaderless replication
    • Split-brain
    • 2PC
    • 3PC
    • WAL
    • Write and read amplification
  • Data structure
    • Probabilistic data structures
      • Bloom filter
      • Count-min sketch
      • HyperLogLog
    • Storage
      • LSM tree
      • B-tree
      • SSTable

Reliability

  • Concepts
    • Difference between availability, resiliency, robustness, fault-tolerance, and reliability
    • Why is it wrong to target 100% availability
    • Blast radius
    • Failure domain
    • Cascading failures
    • Hard vs. soft dependencies
    • Scalability
      • Concepts
      • Knee point
      • Ceiling
    • Number one source of outages
    • Tail tolerance
    • Toil
  • Patterns/Anti-patterns
    • Bulkhead pattern
    • Circuit breaker
    • Exponential backoff
    • Jitter
    • Graceful degradation
    • Load shedding
    • Retry amplification
    • Backpressure
    • Rate limiting
    • Request hedging
  • Practices
    • Chaos engineering

Observability

  • Concepts
    • What's the difference between monitoring and observability
    • Trace vs. metric vs. log
    • Golden signals
    • Observer effect
    • Percentile
    • Streetlight anti-method
    • Time-series based monitoring lies
    • USE method
    • Main metrics for cache
    • Why should we be careful about average performance metrics
  • Alerting
    • Alerting strategy
    • Alerting fatigue concept
    • Characteristic of a good alert
    • Slow vs. fast burn alert

Rollout

  • Concepts
    • Bake time
    • Feature flag
    • Feature freeze
    • Rollout supervision
  • Rollout types
    • Blue green rollout
    • Canary rollout
    • Progressive rollout
    • Shadow rollout

SLI/SLO/SLA

  • Concepts
    • SLI vs. SLO vs. SLA
    • Error budget
  • SLO
    • Difference between KPIs and SLOs
    • Benefits of having alerts based on SLOs
    • Why is exceeding an SLO not necessarily a good thing
    • SLO for data (freshness, completeness, consistency, etc.)
    • SLO for mobiles
    • SLO for services

Container

  • Container
  • Container orchestration

Linux

  • Scripting
  • Filesystem
  • Memory
  • Processes
  • Resource utilization
  • Network

Network

  • ARP protocol
  • Bandwidth
  • BGP
  • CoDel
  • CORS
  • DNS
  • Ping vs. heartbeat
  • TCP
    • TCP vs. UDP
    • Congestion control
    • Connection backlog
    • Flow control
    • Handshake
  • HTTP
  • HTTP/2
  • Head of line blocking
  • Health checks: passive vs. active
  • Internet model
  • NTP
  • OSI model
  • Routers
  • Switch
  • Network topologies
  • What happens if you type google.com in your browser

Security

  • Authentication
  • Certificate
  • Certificate authority
  • Cipher
  • Confidentiality
  • Encryption
  • TLS
  • PKI
  • Signature

Analysis

  • Core analysis loop
  • Correlation vs. causation
  • First principle
  • Five whys technique
  • Incident management
    • How to address an incident (assess, mitigate, resolve)
    • Incident roles
    • How to write a postmortem
    • 3C principles (Coordinate, Communicate, maintain Control)

Other

  • SRE role
  • Version control

Soft skills

  • Communication
    • Writing
    • Oral
    • Presentation
    • The XY problem
  • Collaboration
  • Problem solving
  • Curiosity
  • Navigating ambiguity
  • Staying humble

More Repositories

1

100-go-mistakes

πŸ“– 100 Go Mistakes and How to Avoid Them
Go
6,771
star
2

algodeck

An Open-Source Collection of 200+ Flash Cards to Help You Preparing Your Algorithms & Data Structures Interview πŸ’―
5,600
star
3

designdeck

An Open-Source Collection of 230+ Flash Cards to Help You Succeed in Your System Design Interview and More πŸ’―
328
star
4

gosiris

An actor framework for Go
Go
252
star
5

golang-good-code-bad-code

Go
160
star
6

broadcast

Notification broadcaster library
Go
152
star
7

gossip-glomers

My solutions to the Glomers Challenge: a series of distributed systems challenges.
Go
108
star
8

bitvector

Static bit vector structures library
Go
73
star
9

onecontext

Set of Go context's utilities.
Go
51
star
10

advent-of-code

πŸŽ„ My solutions to the Advents of Code, from 2015 to 2023 (450 🌟)
Go
50
star
11

go-lfu

A Go library for handling LFU cache operations in O(1)
Go
40
star
12

goptional

A lightweight library to provide a container for optional values in golang
Go
32
star
13

majorana

A RISC-V virtual processor, written in Go.
Go
22
star
14

ettore

A RISC-V virtual processor, written in Rust.
Rust
19
star
15

talks

My public talks
10
star
16

TIBreview

Quality code review for TIBCO BusinessWorks 6
Java
9
star
17

vertx-tutorial

Vert.x tutorial
Java
9
star
18

golang-parallel-mergesort

Go
8
star
19

tourniquet

gRPC client-side load balancer
Go
8
star
20

go-cpu-caches

Assembly
7
star
21

multilock

A Go Library to Efficiently Store a Set of Mutex or RWMutex
Go
7
star
22

disruptor-demo

Java
6
star
23

rust-cheatsheet

Rust Language Cheat Sheet
Rust
6
star
24

franz

A collection of Kafka utility tools (load testing, replication)
Rust
6
star
25

reactiveWM

A reactive framework for webMethods extending standard multithreading capabilties
Java
5
star
26

generics

A collection of Go generics utilities.
Go
5
star
27

spark-hdfs-helloworld

Java
5
star
28

resequencer

Resequencer library
Go
4
star
29

parallel-mergesort

Java
4
star
30

awesome-cs

A Curated List of Awesome Computer Science Resources
3
star
31

go-bbl

Brown Bag Lunch on the Go programming language
Go
2
star
32

flogomicroservice

Example of a Flogo microservice exposing a REST API
2
star
33

docker-hadoop

Dockerfile
2
star
34

teivah

1
star