Awesome Architecture
This document starts with a list of concepts and foundations, followed by jobs-to-be-done.
Concepts
- Application Lifecycle Management (ALM)
- Architecturally significant requirements criteria: business value/risk, stakeholder concern, quality level, external dependencies, cross-cutting, first-of-a-kind, source of problems on past projects
- Architectural decision records (ADRs): Records that support team alignment, document strategic directions for a project or product, and reduce recurring and time-consuming decision-making efforts
- Continuous Configuration values often fall into two groups: those that modify operational behavior of an application—such as throttling, limits, connection limits, or logging verbosity—and those that control FAC (Feature Access Control), including feature flags, A/B testing, and user allow/deny lists
- Coupling: Coupling describes the independent variability of connected systems, i.e., whether a change in System A has an effect on System B. If it does, A and B are coupled
- Coupling facets: 1/ Technology (Java vs. C++, Kubernetes, PostgreSQL) 2/ Location (IP addresses, DNS) 3/ Data Format (Binary, XML, JSON, protobuf, Avro) 4/ Data Type (int16, int32, string, UTF-8, null, empty) 5/ Semantic (Name, Middlename, ZIP) 6/ Temporal (sync, async) 7/ Interaction Style (messaging, RPC, query, GraphQL) 8/ Conversation (pagination, caching, retries)
- Declarative provisioning not equal to Declarative language (video, slides)
- Event-Driven Architecture patterns: 1/ Event Notification 2/ Event-carried State Transfer 3/ Event Sourcing 4/ Command and Query Responsibility Segregation
- Feature Flags
- GitOps: 1/ Declarative 2/ Versioned and Immutable 3/ Pulled Automatically 4/ Continuously Reconciled
- Platform: a set of standardized elements that provide value but do not presuppose all problems
- SaaS Architecture Fundamentals
- Software Boundaries or "Fracture Planes": 1/ Business Domain Bounded Context 2/ Regulatory Compliance 3/ Change Cadence 4/ Team Location 5/ Risk 6/ Performance Isolation 7/ Technology 8/ User Personas
- Software delivery performance four key metrics: 1/ Cycle Time (Change Lead Time) 2/ Deployment Frequency 3/ Change Failure Rate (CFR) 4/ Mean Time to Recovery (MTTR)
Foundations
Organizational culture, structure, and processes
- 7 tell-tale signs of fake DevOps
- DevOps at Amazon: A Look at Our Tools and Processes
- DevOps Topologies
- Fireside Chat: DevOps at Amazon with Ken Exner, GM of AWS Developer Tools - AWS Online Tech Talks
- Leadership Session: Developer Tools on AWS (video, slides)
- Linking Modular Architecture to Development Teams
- Pattern-based process for making design decisions
- Seven Shipping Principles
- Software Architecture: the Hard Parts
- Team Interaction Modeling with Team Topologies
- The Away Team Model at Amazon
- The problems with MVPs in legacy replacement (Part 1, Part 2)
- Two-pizza teams: Organizing for innovation (video, slides)
- Would you like architects with your architecture?
Working backwards
- Amazon’s Not So Secret Weapon - The magic of Working Backwards: a real-world case study
- HEY Bubble Up: From kickoff to launch
Product-market fit
- Box’s Aaron Levie on navigating SaaS’ several stages of growth
- Managing growth and value creation in SaaS: An interview with a software leader
Profitable growth
Business metrics
- B2B SaaS benchmarks: What metrics do VCs look at for signs of product-market fit?
- Product metrics that matter the most: A flywheel framework for cloud business leaders
- SaaS and the Rule of 40: Keys to the critical value creation metric
Business and technology alignment
- Modernizing Technology and Mindset with ‘Enabling Teams’
- SaaS Cost Attribution: How to Align Technology with Business
- Strategies for investment in Tech Debt vs Product Debt when building new software products
- Using domain analysis to model microservices
- Why I Never Want to Build Another MVP
Product
- How Detailed Should a User Story Be?
- Product Backlog Building Canvas
- Product requirements: User/actor, Functional, Non-Functional, Technical (not usually in the story)
- Product Requirements Document
- Product requirements documents, downsized
- Shape Up: Mapping the Scopes
- Story types:
- User Story – “As a [type of user] I [want this thing] so that [I can accomplish this goal]”. Example: “As a site visitor, I want to see new content when I come to the site, so I come back more often”.
- Job Story – “When [situation], I want to [motivation], So I can [expected outcome]”. Example: “When it’s dinner time tonight, I want to have pizza so I can easily feed my friends”.
- Feature-Driven Development (FDD) – “[action] the [result] [by|for|of|to] a(n) [object]”. Example: “Generate a unique identifier for a transaction”.
- Why the Three-Part User Story Template Works So Well
Compliance
Technology landscape
Frameworks
- Application Design Framework (ADF)
- AWS Well-Architected Framework pillars: 1/ Operational excellence 2/ Security 3/ Reliability 4/ Performance efficiency 5/ Cost optimization
- Operational Readiness Reviews (ORR)
- SaaS Lens for the AWS Well-Architected Framework
Cross-cutting concerns
- Aligning SaaS and Service Planes Definitions
- Architect multitenant solutions on Azure
- Are you integrating or building distributed applications? (video, slides)
- Building ClickHouse Cloud From Scratch in a Year
- Cloud Automation à la DDD: From stringly typed to affordances
- Choreography vs Orchestration in the land of serverless
- Failing successfully: The AWS approach to resilient design
- How we ended up with microservices
- Introducing the Journey to SaaS Guide to Help You Build, Launch, and Operate SaaS Solutions on AWS
- Kubernetes as a platform vs. Kubernetes as an API
- Minimizing Design Time Coupling in a Microservice Architecture
- Modern cloud applications: Do they lock you in? (video, slides)
- Monoliths are not dinosaurs
- On Designing and Deploying Internet-Scale Services
- Serverless or Kubernetes on AWS
- Takeaways of building a business-critical low-latency microservice at scale
- You Want Modules, Not Microservices
Landing zone
Platform
- Building Infrastructure Platforms
- Integrating Backstage at DAZN
- The Magic of Platforms • Gregor Hohpe • PlatformCon 2022
Jobs-to-be-done
Access control and isolation
- A Rails Multi-Tenant Strategy That's ~30 Lines and "Just Works"
- Building Multi-Tenant Solutions with Amazon OpenSearch Service
- How to implement SaaS tenant isolation with ABAC and AWS IAM
- Implementing SaaS Tenant Isolation Using Amazon SageMaker Endpoints and IAM
- Performance isolation in a multi-tenant database environment
- Secure data movement across Amazon S3 and Amazon Redshift using role chaining and ASSUMEROLE
- Securing Multi-Tenant Kubernetes Clusters at Scale
- Solving large-scale data access challenges with Amazon S3 (video, slides)
API
- Architecture patterns for consuming private APIs cross-account
- Best practices for working with the Apache Velocity Template Language in Amazon API Gateway
- How Netflix Scales its API with GraphQL Federation (Part 1, Part 2)
Authentication and authorization
- Edge Authentication and Token-Agnostic Identity Propagation
- Enhancing Amazon DynamoDB single-table design with AWS AppSync access and security features
- Entitlements: Architecting Authorization
- How to Persist JWT Tokens for Your SaaS Application
- Amazon DocumentDB (with MongoDB compatibility) user-defined roles for access control
- JSON Web Token (JWT) Profile for OAuth 2.0 Access Tokens
- On The Nature of OAuth2’s Scopes
Configuration
Deployment
- Amazon CI/CD Practices for Software Development Teams (video, slides)
- Amazon's approach to high-availability deployment (video, slides)
- Automate rollbacks for Amazon ECS rolling deployments with CloudWatch alarms
- Automating safe, hands-off deployments (video, article, podcast)
- Best practices for CI/CD using AWS Fargate and Amazon ECS (video, slides)
- Best practices for CI/CD with AWS Lambda and Amazon API Gateway
- Building a Continuous Integration Workflow with Step Functions and AWS CodeBuild
- Building a cross-account continuous delivery pipeline for database migrations
- Building and testing polyglot applications using AWS CodeBuild
- CDK Pipelines: Continuous delivery for AWS CDK applications
- Continuous Delivery: Anatomy of the Deployment Pipeline
- Continuous Delivery of Amazon EKS Clusters Using AWS CDK and CDK Pipelines
- Deploying GitOps with Weave Flux and Amazon EKS
- Deployment Pipelines Reference Architecture and Reference Implementations
- Ensuring rollback safety during deployments
- Migrating Critical Traffic At Scale with No Downtime (Part 1, Part 2)
- My CI/CD pipeline is my release captain
- Overview of Deployment Options on AWS
- Parallel and dynamic SaaS deployments with AWS CDK Pipelines
- Practicing Continuous Integration and Continuous Delivery on AWS
- Releasing Mission-Critical Software at Amazon (video, slides)
- Rolling Forward and other Deployment Myths
- Seamless branch deploys with Kubernetes
- Serverless CI/CD for the Enterprise on AWS
- Using AWS Step Functions State Machines to Handle Workflow-Driven AWS CodePipeline Actions
- Validating AWS CodeCommit Pull Requests with AWS CodeBuild and AWS Lambda
Development
- Applying the Twelve-Factor App Methodology to Serverless Applications
- Branch by Abstraction for major changes that take time
- Building production-ready prototypes (video, slides)
- Deploy AWS Organizations resources by using CloudFormation
- Include CloudFormation templates in the CDK
- Managing resources using AWS CloudFormation Resource Types
- Running bash commands in AWS CloudFormation templates
- The Twelve-Factor App
- This is why you should keep stateful and stateless resources together
- Trunk-Based Development
Encryption
Extensibility
Frontend
Hybrid architecture
Integration patterns
- Architecture patterns for consuming private APIs cross-account
- Starbucks Does Not Use Two-Phase Commit
Internet of Things (IoT)
Machine learning
Migrations
Multi-region
- Implement Multi-Region Serverless (and Functionless) WebSocket Pub/Sub APIs with AWS AppSync and Amazon EventBridge
- Ten tips for multi-tenant, multi-Region object replication in Amazon S3
Networking
- Addressing latency and data transfer costs on EKS using Istio
- Building the Next Evolution of Cloud Networks at Slack – A Retrospective
- Designing hyperscale Amazon VPC networks
- How FactSet handles networking for 1000+ AWS accounts
- VPC sharing: key considerations and best practices
Observability
- Amazon CloudWatch Now Includes Contributor Insights - in Preview
- AWS X-Ray (see also Integrating AWS X-Ray with Other AWS Services)
- AWS X-Ray Now Supports Amazon API Gateway and New Sampling Rules API
- Container monitoring for Amazon ECS, EKS, and Kubernetes is now available in Amazon CloudWatch
- Debugging with Amazon CloudWatch Synthetics and AWS X-Ray
- One observability workshop
- Using Prometheus Metrics in Amazon CloudWatch
- Visualize and Monitor Highly Distributed Applications with Amazon CloudWatch ServiceLens
Operations
- Accounting for the Basecamp 3 outage on June 27, 2022
- Amazon’s approach to failing successfully (video, slides)
- Building dashboards for operational visibility
- Changing the Wheels on a Moving Bus — Spotify’s Event Delivery Migration
- Kubernetes cluster upgrade: the blue-green deployment strategy
- Resolve IT Incidents Faster with Incident Manager, a New Capability of AWS Systems Manager
- Towards Operational Excellence blog post series:
- ZEN and the art of Reliability
Sharding and partitioning data
- Decomposing the GitLab backend database:
- E-Commerce at Scale: Inside Shopify's Tech Stack - Stackshare.io
- Herding elephants: Lessons learned from sharding Postgres at Notion
- Improve performance and manageability of large PostgreSQL tables by migrating to partitioned tables on Amazon Aurora and Amazon RDS
- Partitioning GitHub’s relational databases to handle scale
- Scaling Datastores at Slack with Vitess
- Scaling Etsy Payments with Vitess:
Tenant costs
- Calculating Tenant Costs in SaaS Environments
- Calculating SaaS Cost Per Tenant: A PoC Implementation in an AWS Kubernetes Environment