• Stars
    star
    376
  • Rank 113,810 (Top 3 %)
  • Language
  • License
    Creative Commons ...
  • Created almost 7 years ago
  • Updated about 2 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Our guidelines for building new applications and managing legacy systems

Zalando's Engineering and Architecture Principles

ℹ️ This repository is deprecated. The below principles were introduced in 2015 and no longer reflect the current reality (2020) in all aspects. To learn more about our current practices, please follow our Engineering Blog on https://engineering.zalando.com/

The Principles, Briefly

In March 2015, we have adopted this set of principles for tech and architecture:

  • microservices
  • API First
  • REST
  • Cloud
  • Software as a Service (SaaS)

This document focuses primarily on services, as the principles for interoperating services are quite mature and stable. Note: A (micro-) service is an application, but not all applications are services. For example, a frontend is not a service. Its requirements are fundamentally harder to meet because of aesthetic and user experience concerns. And the fast-moving set of technologies around the browser bring less maturity and more complexity.

Starting from Scratch

We strive to build applications that are:

  • resilient
  • extensible
  • maintainable
  • with quality built-in and
  • scalable to adjust to demand.

These properties lead to architectural principles that guide the choices we have to make.

Architecture

We prefer loosely coupled services. They are more resilient when it comes to remote dependency failures. We aim to develop autonomous isolated services that can be independently deployed and that are centered around defined business capabilities.

How to build a loosely-coupled system

Asynchronous Communication

Synchronous calls to remote systems can lead to threads in waiting state until the call times out. This can completely paralyze a system, as more and more threads move into that state until the system can no longer react to new requests. Further synchronous calls are blocking and prevent the thread from doing anything else.

We reduce the impact of remote failures by communicating asynchronously via events, where possible. Microservices publish streams of events to an event broker, which interested microservices consume asynchronously. Communication with other systems becomes non-blocking: even when some functionality is affected temporarily, the system continues working.

Service Degradation

In synchronous calls, we react to remote dependency failures by degrading a service until the remote dependency is working again. This means that your system must be aware of remote dependency health. It needs to detect failure and also notice when an external dependency becomes healthy again.

In asynchronous processing scenarios, you usually have more flexibility in processing service delivery time. Hence, you can live with temporary remote failures and delay processing with retries. However, service degradation is also an option here, if otherwise processing delays due to remote failures are not acceptable.

Use Low-Tech Coupling

Low-tech coupling reduces issues resulting from changes in communicating systems, and can reduce complexity and dependencies. An example of low-tech coupling is service discovery via DNS. Communication should be done over interoperable protocols like HTTP instead of, for example, RMI.

RESTful APIs with JSON payload

We prefer REST-based APIs with JSON payloads to SOAP. Distributed SOAs following the REST style have a looser coupling between client and server implementations and comes with less rigid client/server contracts that do not break if either side make certain changes. Hence it is easier to build interoperating distributed systems that can be evolved in parallel by different teams while continuing to work. REST-like APIs with JSON payload is the most widely accepted and used service interfacing style in the internet web service industry.

API First

The API Guild provides structure around the details of our API strategy. In April 2016, Guild members released this comprehensive model set of RESTful API guidelines, which define standards to successfully establish “consistent API look and feel” quality.

We also adopted "API First" and "API as a Product" as key engineering principles. In a nutshell, API First encompasses a set of quality-related standards (including the API guidelines and tooling) and fosters a peer review culture; it requires two aspects:

  • define APIs outside the code first using a standard specification language (Open API 2.0)
  • get early review feedback from peers and client developers (following a lightweight API review procedure)

Microservice Size

Build services based on domain model and around business entities with state and behavior —- for example “orders”, “payments” or “prices”. In REST terms, these are “resources”.

A service should be big enough to offer a valid business capability, but small enough to be handled by a team that can be fed by two pizzas (Amazon’s Two-Pizza Team rule) -- from two to 12 people. In practice, a Two-Pizza Team may be able to own and run a large number of small services, or a smaller number of larger services.

All things considered, we prefer smaller services written in expressive programming languages with minimal code whenever possible.

Technology Selection

A service typically includes several layers of the tech stack -— entrypoint, business logic and data storage -- and offers a clean API as an integration point. Teams have a lot of freedom to choose the best technologies for each layer. To balance innovation with economies of scale, we maintain the Zalando Tech Radar as an internal tool for decision support and knowledge sharing.

Autonomy

A service:

  • should be as autonomous as possible.
  • should run in its own process and be independently deployable.
  • should start up and be resilient when its dependencies are not available.
  • should not share its data storage or code repository with any other service, so that changes do not affect other systems.
  • should not share libraries with other services, unless those libraries are open-source or inner-source and are actively maintained by a community. Shared dependencies may lead to a large-scale complexity over time.
  • should not provide a client library containing business logic. The core API and its data model are expressed as REST and JSON.

APIs

Our APIs form the purest expression of what our systems do. But API design is hard work and takes time. We prefer peer-reviewed APIs which are designed in an "API First" way and developed outside code (using OpenAPI, for example), to avoid the complexity and cost of making big changes. We prefer ongoing documentation to be generated from the code itself.

Our APIs need to last for a long time, so they must evolve in certain ways. Our APIs should all be similar in tone; we establish and agree to standards for how to do this. We will host API documentation for all our APIs in a central, searchable place. Documentation should always provide examples.

Our APIs should obey Postel's Law -— a.k.a. "the Robustness Principle": Be conservative in what you send, be liberal in what you accept. APIs must be evolved without breaking any consumers.

Some Good Reads:

SaaS

Build your services so that it’s possible to offer them as a SaaS solution to third parties. In fact, consider any other system a third party with regards to API structure, resilience and service level. This is easier to do than it was a few years ago: AWS pushes us this way, the Internet model scales, and our security model is geared toward allowing our services to be on the open Internet.

We want to offers services in ways we never imagined or expected. This is part of being a platform. In some cases, this means being multi-tenant from the start.

Security

Always use SSL and make sure the caller of your service is authenticated and authorized. SSL actually means "HTTPS everywhere, not HTTP."

General guidelines

Stateless

When possible, be stateless. If you can’t, persist state outside the address space of the application, for example in a database.

Immutable

Strive for immutability whenever possible. An object is immutable if its state cannot be modified. Immutable things are automatically thread-safe, without requiring synchronization. Overall, immutability tends to result in fewer bugs and makes it easier to prove a program correct.

Idempotent

Whenever possible and reasonable, make service endpoints idempotent, so that an operation produces the same result even when it’s executed multiple times. This allows clients to safely retry operations in case of timeouts due to service processing or network failures.

Development

Some general guidelines for how we think a development team should work.

Agile > Process

We don't mind which agile collaboration process (Scrum, Kanban etc.) you follow. Don’t focus too much on the process, focus on the outcome! Unfortunately, a defined process is required to satisfy our company audit requirements, but we like to keep it as minimal as possible.

Projects

We prefer that (almost) all work is done around some kind of conceptual “project.”

A project should have a clear purpose or goal. If it’s customer-facing, it should have some minimal business justification for why we are doing it. Assembling this information is typically the role of a product owner, but sometimes engineers need to do this themselves.

Having a first-class, cross-team notion of “project” is nice for a lot of reasons. It ultimately helps us to build automation that minimizes auditing and controlling overhead. It also helps us to report what we do for tax purposes -- and getting this right can save a lot of money.

No Micromanagement

If you feel like you’re being micromanaged, push back. We don’t do that here. On the other hand, it’s fine to ask for detailed support -- but it shouldn’t ever come as unwanted. The team — not the Delivery Lead — decides on who builds what and how it’s done.

Peer Review

Don’t wait until you’re done to ask for code review: It’s the best way to catch defects early. Create a pull request at the start of your work, not at the end. This pulls people into an ongoing conversation about your code, from Day One.

Code review is expensive in some ways, so get the most out of it. Reviewing code is a great way to learn about style, get help with idioms, and grow as a programmer and reviewer.

Code review can be hard when the culture around it isn’t supportive and constructive. It takes practice to learn how to accept code reviews without getting defensive, and to review code without focusing on trivial things. Don’t bike shed.

Peer review gets easier when you have a good attitude about it. Everybody around you is smart, and you are smart. We’re all smart in different ways.

Depending on the team and its codebases, it might be required that at least one person reviews code before it goes live. This is especially true for systems that touch customer or financial data. In general, though, we don’t want to focus about when code review is or isn’t required: The system works best when people decide on their own that code review is valuable, and seek it out.

Architectural decisions should be made as a team, and the team should ask for help if it’s unsure. Ask your Delivery Lead, People Lead, and/or Engineering Head, or even experts from other teams (if it makes sense). Embrace open discussions and alternate opinions.

Quality

Quality is related to mindset, and it’s part of engineering. Systems that support multi-billion-Euro companies must be engineered for high quality. Usually this means:

  • writing unit tests early on mocking external systems so you can test against them while they’re not running, and also so that you can simulate various - failure scenarios from the service and the network between it
  • striving for automation

Automate testing whenever possible. It’s not always possible, but life is almost always better if you invest in automated tests of your code. (See Martin Fowler's Testing Strategies in a Microservice Architecture.)

We’re not going to require you to test your code, but expect your peers to challenge you if you don’t. For the most part, a dedicated QA team is a thing of the past. You and your team are responsible for your code’s behavior: There’s no other safety net.

Years ago, we didn’t build systems this way. Now we must. Fortunately, the tooling is pretty amazing.

Continuous Delivery

Strive for very short release cycles, optimally deploying daily; automating the delivery pipeline makes this possible. Small releases tend to have fewer bugs. Use canary testing for your new deployments to identify problems early.

Best practices for Continuous Delivery and CI/CD-as-a-Service are available for all teams.

Source Code Management

We support GitHub as SCM to check in your code. You might want to use local git hooks for checking references to specifications in commit messages or checks.

Documentation

Document the architecture of your APIs and applications. Make it clear, concise, and current. Use inline documentation for more complex code fragments.

Open Source

Zalando's strategy has evolved from “Open Source All The Things!” to a focus on mastery, and a commitment to not just releasing code but building communities around open source. Open Source is a way of working and thinking that pushes forward not just great code but a great set of values and ways of collaborating that we believe are enormously engaging and powerful, and we strive to use them for the benefit of all. Here is a detailed guide to open-sourcing projects at Zalando.

Deployment

Cloud vs. On-Premise

AWS is our default choice for new projects, so that we can take full advantage of the flexibility and scalability of the cloud and its rich set of integrated services. We continue to run dedicated hardware (both on premise and in partner data centers) for some special or legacy use cases.

Docker

We favour containerised application development and our current deployment tool of choice is Docker. We use our Continuous Deployment Platform (CDP) in combination with our hosted Kubernetes service.

Monitoring and Logging

We use both Scalyr and ZMON, our open-source inhouse monitoring solution, to track business KPIs and other metrics. We use distributed tracing to analyze incidents and performance issues in our microservice infrastructure.

Closing Words

The Joy of Programming

The authors love code. Building simple systems that work efficiently and quickly brings us joy. Seeing these systems interoperate cleanly and harmoniously gives us pleasure. We do this because we love it. If we didn’t have to work, we’d probably still do this. And we know we’re not alone.

Building software systems can produce substantial existential pleasure. When the conditions are just right, programming is a reliable path to Flow: a state almost beyond pleasure. We want to get there, and stay there, and we want you to join us there. We hope these principles help.

License

We have published these guidelines under the Creative Commons Attribution 4.0 (CC-BY) license.

More Repositories

1

patroni

A template for PostgreSQL High Availability with Etcd, Consul, ZooKeeper, or Kubernetes
Python
6,267
star
2

postgres-operator

Postgres operator creates and manages PostgreSQL clusters running in Kubernetes
Go
3,686
star
3

skipper

An HTTP router and reverse proxy for service composition, including use cases like Kubernetes Ingress
Go
3,088
star
4

restful-api-guidelines

A model set of guidelines for RESTful APIs and Events, created by Zalando
CSS
2,605
star
5

zalenium

A flexible and scalable container based Selenium Grid with video recording, live preview, basic auth & dashboard.
Java
2,385
star
6

SwiftMonkey

A framework for doing randomised UI testing of iOS apps
Swift
1,947
star
7

logbook

An extensible Java library for HTTP request and response logging
Java
1,788
star
8

tailor

A streaming layout service for front-end microservices
JavaScript
1,728
star
9

tech-radar

Visualizing our technology choices
1,581
star
10

spilo

Highly available elephant herd: HA PostgreSQL cluster using Docker
Python
1,225
star
11

intellij-swagger

A plugin to help you easily edit Swagger and OpenAPI specification files inside IntelliJ IDEA
Java
1,172
star
12

problem-spring-web

A library for handling Problems in Spring Web MVC
Java
1,031
star
13

nakadi

A distributed event bus that implements a RESTful API abstraction on top of Kafka-like queues
Java
928
star
14

zally

A minimalistic, simple-to-use API linter
Kotlin
903
star
15

problem

A Java library that implements application/problem+json
Java
869
star
16

zalando-howto-open-source

Open Source guidance from Zalando, Europe's largest online fashion platform
799
star
17

go-keyring

Cross-platform keyring interface for Go
Go
689
star
18

gin-oauth2

Middleware for Gin Framework users who also want to use OAuth2
Go
579
star
19

zappr

An agent that enforces guidelines for your GitHub repositories
JavaScript
542
star
20

pg_view

Get a detailed, real-time view of your PostgreSQL database and system metrics
Python
494
star
21

gulp-check-unused-css

A build tool for checking your HTML templates for unused CSS classes
CSS
359
star
22

zmon

Real-time monitoring of critical metrics & KPIs via elegant dashboards, Grafana3 visualizations & more
Shell
355
star
23

expan

Open-source Python library for statistical analysis of randomised control trials (A/B tests)
Python
325
star
24

PGObserver

A battle-tested, flexible & comprehensive monitoring solution for your PostgreSQL databases
Python
316
star
25

riptide

Client-side response routing for Spring
Java
292
star
26

jackson-datatype-money

Extension module to properly support datatypes of javax.money
Java
240
star
27

grafter

Grafter is a library to configure and wire Scala applications
Scala
240
star
28

opentracing-toolbox

Best-of-breed OpenTracing utilities, instrumentations and extensions
Java
180
star
29

elm-street-404

A fun WebGL game built with Elm
Elm
176
star
30

tokens

Java library for conveniently verifying and storing OAuth 2.0 service access tokens
Java
169
star
31

innkeeper

Simple route management API for Skipper
Scala
166
star
32

public-presentations

List of public talks by Zalando Tech: meetup presentations, recorded conference talks, slides
165
star
33

python-nsenter

Enter kernel namespaces from Python
Python
139
star
34

faux-pas

A library that simplifies error handling for Functional Programming in Java
Java
132
star
35

dress-code

The official style guide and framework for all Zalando Brand Solutions products
CSS
129
star
36

beard

A lightweight, logicless templating engine, written in Scala and inspired by Mustache
Scala
121
star
37

friboo

Utility library for writing microservices in Clojure, with support for Swagger and OAuth
Clojure
117
star
38

spring-cloud-config-aws-kms

Spring Cloud Config add-on that provides encryption via AWS KMS
Java
99
star
39

zalando.github.io

Open Source Documentation and guidelines for Zalando developers
HTML
86
star
40

failsafe-actuator

Endpoint library for the failsafe framework
Java
52
star
41

package-build

A toolset for building system packages using Docker and fpm-cookery
Ruby
35
star
42

ghe-backup

Github Enterprise backup at ZalandoTech (Kubernetes, AWS, Docker)
Shell
30
star
43

rds-health

discover anomalies, performance issues and optimization within AWS RDS
Go
26
star
44

backstage-plugin-api-linter

API Linter is a quality assurance tool that checks the compliance of API's specifications to Zalando's API rules.
TypeScript
12
star
45

.github

Standard github health files
1
star