• This repository has been archived on 08/Apr/2022
  • Stars
    star
    125
  • Rank 276,252 (Top 6 %)
  • Language
    Shell
  • Created over 3 years ago
  • Updated over 2 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

A horizontally scalable NGINX caching cluster

A horizontally scalable NGINX caching cluster

NGINX is a proxy server that makes HTTP caching simple. Run it in front of an app, set the right HTTP caching headers, and it does its job. If you want to build a basic CDN, you can fire up NGINX in multiple cities, route people to the nearest instance, and apply a little magic.

This is a Docker based NGINX cluster that works kind of like a CDN. It's designed to run on Fly.io with persistent volumes and private networking (6PN). It also runs locally so you can fiddle around with it.

Speed-run

  1. flyctl launch (don't deploy yet)
  2. Add volumes with flyctl volumes create
  3. flyctl deploy
  4. flyctl scale count <x> to scale horizontally
  5. Profit

Consistent hashing

This cluster uses consistent hashing to ensure that identical requests are sent to the same NGINX instance each time. This is helpful for maximizing cache hits, and uses the built in consistent_hash setting for upstreams.

ngx_http_upstream_consistent_hash - a load balancer that uses an internal consistent hash ring to select the right backend node

Let's build a Giphy cache

Giphy has a bunch of great GIFs, but what if they slow down? GIFs are mission critical for some apps, it would be nice to keep the ones we care about fast. Send people GIFs in a jiffy.

First up, we need to tell NGINX where to get its GIFs (otherwise known as he origin). We can do that with proxy_pass, instructing NGINX to pass requests to media.giphy.com and see what it says.

location / {
    proxy_pass https://media.giphy.com/;
    proxy_cache http_cache;
    proxy_http_version 1.1;
    proxy_set_header Connection "";
    proxy_cache http_cache;
}

The proxy_cache line in this block tells NGINX to cache requests (when it can) using a cache named http_cache.

proxy_cache_path /data/nginx-cache levels=1:2 keys_zone=http_cache:128m max_size=500g inactive=3d use_temp_path=off;

This gives us a 500GB cache named http_cache, and the files are stored at /data/nginx-cache. As long as we have a 500GB disk for NGINX to use, this is all we need – it'll evict files when storage gets tight.

Load balancers all the way down

We can turn this into a GIF cache cluster by running extra nodes, including them in a load balancer pool, defining the consistent hash key, and then checking to make sure the config has all the necessary semicolons (a special feature of NGINX configurations is how hard it is to debug a missing semicolon).

upstream nginx-nodes {
    hash "$scheme://$host$request_uri" consistent;
    keepalive 100;

    server node1:8081 max_fails=5 fail_timeout=1s;
    server node2:8081 max_fails=5 fail_timeout=1s;
}

This tells NGINX to use the full URL (including scheme and host) to hash consistently, and send requests to port 8081 on node1 and node2. And it says to consider those nodes bad if they fail 5 times in one second, which means we can retry the request on another.

Since we're deploying a cluster of nodes, we're instructing NGINX to load balance across the other nodes in the cluster.

Discovering nodes

Making this cluster topology work properly is a game of service discovery. The Fly DNS service always has a current list of IPv6 Private Network (6PN) addresses for VMs in a given app. The dig utility can query the DNS service for all running VMs in a given application:

dig aaaa $FLY_APP_NAME.internal @fdaa::3 +short
fdaa:0:1:a7b:7b:0:5f1:2
fdaa:0:1:a7b:7f:0:5f2:2

It can also query by region:

dig aaaa dfw.$FLY_APP_NAME.internal @fdaa::3 +short

We can use these to keep nginx.conf updated with a list of servers. This happens in two places:

  1. start.sh preps the nginx.conf file, calls check-nodes.sh, and then boots NGINX
  2. check-nodes.sh uses dig to find the list of servers in the same region and write upstream block with known 6PN addresses. This script runs periodically to keep things fresh.

This is how a basic CDN works. Multiple cache nodes in each region, each requests from origin when it needs a file, and caches it for later. Each region will still need to warm its own cache to get snappy.

Deploying a CDN

The NGINX features we're using have been around for ages. They were built long before 2020. But deploying a CDN has, historically, been beyond the scope of a single developer. This is part of the reason we built Fly, we think ops can be automated and individual developers can ship their own CDNs.

So here's how to deploy a shiny, horizontally scalable CDN in about 5 minutes.

Create a Fly app

The quickest way to create an app is to import the fly.source.toml file we created for you:

fly launch <app-name>

Replace <app-name> with your preferred app name or omit it to have Fly generate a name for you. You may be prompted for which organization you want the app to run in.

When it asks you if you want to deploy, say no.

NGINX needs disks, so go ahead and create one or more volumes (you'll need one volume per node when you scale out):

flyctl volumes create nginx_data --region dfw --size 500

This creates a 500GB volume named nginx_data in Dallas. You can add more volumes in Dallas, or put them in other regions, just make sure they're all named nginx_data.

To deploy the app, run:

fly deploy

Congrats! You now have a single server GIF cache running with global Anycast IPs routing your traffic (run flyctl info).

Scaling is just a matter of adding volumes for your next VMs. Add 'em in the regions you want, put multiples in the regions you want to shard, and then scale your app out:

flyctl scale count 3

Now you have three total NGINX servers running, each with its own disk. Requests with the same URL route to the same server.

See it in action with cURL

Fire up your terminal and run this command to make a request to our example GIF caching service, and print the headers out:

curl -D - -o /dev/null -sS https://nginx-cluster-example.fly.dev/media/7twIWElrcmnzW/source.gif
HTTP/2 200
server: Fly/004c36a8 (2020-12-08)
date: Wed, 23 Dec 2020 23:49:39 GMT
content-type: image/gif
content-length: 2085393
accept-ranges: bytes
last-modified: Sat, 13 Jul 2019 04:40:21 GMT
etag: "00e2a6744ab9aea25e6e3ca20e0fe46f"
via: 1.1 varnish, 1.1 varnish, 2 fly.io
access-control-allow-origin: *
cross-origin-resource-policy: cross-origin
age: 0
x-served-by: cache-bwi5134-BWI, cache-iah17254-IAH
x-cache: HIT, MISS
x-cache-hits: 1, 0
x-timer: S1608767381.985906,VS0,VE30
strict-transport-security: max-age=86400
cache-control: max-age=86400
fly-cache-status: MISS
x-instance: d12f720f

There are some special headers here:

  • x-instance – specifies the ID of the server that sent the response. This should be the same each time you run cURL with that URL
  • fly-cache-status – indicates if a request was served from the cache or not.

If we run the same curl again, the x-instance remains unchanged, and the fly-cache-status shows a HIT.

But if we try a different URL:

curl -D - -o /dev/null -sS "https://nginx-cluster-example.fly.dev/media/l1KVcBV7rstepCYhi/giphy.gif"
HTTP/2 200
server: Fly/004c36a8 (2020-12-08)
date: Wed, 23 Dec 2020 21:05:02 GMT
content-type: image/gif
content-length: 3946398
accept-ranges: bytes
last-modified: Wed, 12 Apr 2017 19:14:41 GMT
etag: "81630bf6b606ff600f90dc91a9dbd0a1"
via: 1.1 varnish, 1.1 varnish, 2 fly.io
access-control-allow-origin: *
cross-origin-resource-policy: cross-origin
age: 50421
x-served-by: cache-bwi5135-BWI, cache-iah17222-IAH
x-cache: HIT, HIT
x-cache-hits: 117, 1
x-timer: S1608753636.146957,VS0,VE1
strict-transport-security: max-age=86400
cache-control: max-age=86400
fly-cache-status: HIT
x-instance: 3d727da8

The x-instance header indicates it came from a different server.


Where to go from here

HTTP caching is simple, but global cache expiration is hard. Users will want to clear the cache when their app changes, or they need to delete stale data for other reasons, and "immediate cache expiration" is a spiff feature to offer. If we were going to build that, we'd build a little worker that runs with each NGINX server and listens for purge events from NATs.

People who build snappy apps spend a lot of time optimizing images. CDNs can do that! This NGINX cluster could work with imgproxy or 'imaginary' to automatically cache and serve WebP images, add classy visual effects, and even do smart cropping.

More Repositories

1

dockerfile-rails

Provides a Rails generator to produce Dockerfiles and related files.
Dockerfile
422
star
2

postgres-ha

Postgres + Stolon for HA clusters as Fly apps.
Go
287
star
3

litefs-js

JavaScript utilities for working with LiteFS on Fly.io
TypeScript
154
star
4

terraform-provider-fly

Terraform provider for the Fly.io API
Go
114
star
5

dockerfile-node

Dockerfile generator for Node.js
JavaScript
111
star
6

edge-apollo-cache

Run and cache results from your Apollo GraphQL server on the edge with Fly
JavaScript
91
star
7

redis-geo-cache

A global Redis cache
Shell
81
star
8

bun

Bun JS app doing basically nothing
TypeScript
76
star
9

redis

Launch a Redis server on Fly
Shell
70
star
10

fly-run-this-function-on-another-machine

This is a simple example on how to spawn a Fly.io machine and run a function from there.
JavaScript
63
star
11

hello-rust

Rust example app on Fly.io
Dockerfile
62
star
12

nats-cluster

Global messaging for apps that need to talk to each other.
Go
44
star
13

postgres-flex

Postgres HA setup using repmgr
Go
44
star
14

tailscale-router

Go
40
star
15

rds-connector

Trivial Terraform example for a WireGuard peer to RDS
HCL
38
star
16

docker-daemon

A Docker daemon to run in Fly and access via a WireGuard peer.
Shell
38
star
17

fly-laravel

Run your Laravel apps on Fly
PHP
38
star
18

hello_elixir

An example for building and deploying an Elixir application to Fly using a Dockerfile
Elixir
38
star
19

litestream-base

A base Docker image for adding Litestream to apps
Dockerfile
33
star
20

smokescreen

An example of deploying Smokescreen on Fly.io
Go
31
star
21

go-example

A minimal Go application for tutorials
Go
29
star
22

python-hellofly-flask

A Pythonic version of the Hellofly example
Python
29
star
23

laravel-docker

Base Docker images for use with Laravel on Fly.io
Shell
28
star
24

cockroachdb

Shell
27
star
25

nginx

A fly app nginx config
Dockerfile
23
star
26

supercronic

Run periodic jobs on Fly with supercronic
Dockerfile
21
star
27

hello-fly-langchain

A minimal example of how to deploy LangChain to Fly.io using Flask
Python
21
star
28

hellonode-builtin

A minimal Fly example Node application for use in tutorials
JavaScript
20
star
29

vscode-remote

Shell
19
star
30

privatenet

Examples around querying 6PN private networking on Fly
JavaScript
18
star
31

node-demo

Fly.io Node.js demo
JavaScript
18
star
32

puppeteer-js-renderer

A service to render js for web scraping hosted on fly.io
JavaScript
17
star
33

hello-static

Create a static website with Fly - HTML from the example
HTML
16
star
34

ghost-litestream

Ghost + Litestream for global sqlite blogging
Dockerfile
16
star
35

wordpress-sqlite

Wordpress on SQLite
PHP
16
star
36

coredns

Authoritative CoreDNS on Fly.io
DIGITAL Command Language
15
star
37

ichabod

serf + headless chromium && CDP
Dockerfile
14
star
38

nix-base

Nix overlays for supporting Nix deployments on Fly.io
Nix
14
star
39

fly-log-local

Store Fly app logs locally.
Dockerfile
13
star
40

terraformed-machines

Example of Fly.io machines orchestration with Terraform and DNSimple
HCL
13
star
41

elixir_opentel_and_grafana

Project that goes with a Fly.io Phoenix Files article
Elixir
13
star
42

dockerfile-laravel

PHP
13
star
43

fastify-functions

Example Fastify server
JavaScript
12
star
44

postgres-migrator

Fly app that works to streamline Postgres migrations.
Dockerfile
12
star
45

pdf-appliance

Auto start machines that will generate PDFs for your application
TypeScript
12
star
46

hellodeno

A version of the Hellodeno example that uses flyctl's builtin deno builder
TypeScript
11
star
47

whisper-example

Fly GPU Machines transcribing an mp3 file with Whisper
Dockerfile
11
star
48

hello_elixir_sqlite

An example for building and deploying an Elixir application to Fly using a Dockerfile and SQLite!
Elixir
10
star
49

keydb

KeyDB server on Fly
Shell
9
star
50

fly-app-with-multiple-internal-ports

Example of how to deploy an app that has multiple ports listened to
JavaScript
9
star
51

fly-varnish

Dockerfile
9
star
52

grafana

Run Grafana on Fly
8
star
53

ollama-demo

@jmorganca's ollama.ai demo app on Fly.io
8
star
54

postgres-importer

Shell
8
star
55

nodejs-planetscale-read-replicas

A sample Node.js app that uses a Planetscale database with additional read-only regions
JavaScript
7
star
56

kong-api-gateway

Dockerfile
7
star
57

hostnamesapi

JavaScript examples for working with the new hostnames API on Fly
JavaScript
7
star
58

appkata-minio

MinIO S3-compatible storage on Fly
Dockerfile
7
star
59

autoscale-to-zero-demo

TypeScript
7
star
60

live-elements-demo

Live Elements Demo
Ruby
6
star
61

6pn-demo-chat

A Node-based websockets and NATS chat app which uses Fly 6PN networking
JavaScript
6
star
62

rqlite

Shell
6
star
63

deno-apollo

TypeScript
6
star
64

hello-fly

JavaScript
6
star
65

ssh-app

Run an SSH server to connect your privately networked Database apps to things like BI tools
Dockerfile
6
star
66

global-apollo-server

Fly global deployment with Apollo Server and Prisma
TypeScript
6
star
67

rails-nix

Deploy Rails apps on Fly.io with Nix
Ruby
5
star
68

code-server-dev-environment

Dockerfile
5
star
69

hello-flask

Example project demonstrating how to deploy a Flask app to Fly.io.
HTML
5
star
70

postgres

Deploy a Postgres database on Fly, ready for your Fly apps.
4
star
71

appkata-mqtt

An MQTT server app (with Mosquitto) with TLS+passwords
Dockerfile
4
star
72

global-rails

Ruby
4
star
73

globe-request-mapper

Elixir
4
star
74

buildkite-agent

Run a Buildkite agent on Fly with disk caching
Dockerfile
4
star
75

openresty-basic

Dockerfile
4
star
76

flydictionary

A light crud example for database examples
JavaScript
4
star
77

grpc-service

Running gRPC services on Fly.io
JavaScript
4
star
78

elixir_prom_ex_example

Elixir
4
star
79

hello-create-react-app

JavaScript
4
star
80

hello-remix

Sample Remix app setup for deployment on Fly.io
JavaScript
3
star
81

tcp-echo

TCP echo service for testing things that TCP
Go
3
star
82

rails-on-fly

Ruby
3
star
83

laravel-worker

Auto-scaled Laravel queue workers on Fly.io
PHP
3
star
84

pi-hole

Dockerfile
3
star
85

fly-nestjs

Example NestJS application configured for deployment on Fly.io
TypeScript
3
star
86

view-component-playground

Various view components tried in Rails
Ruby
3
star
87

python_gpu_example

A setup with Jupyter for GPU-enabled ML tinkering
Shell
3
star
88

hello-django

Example project demonstrating how to deploy a Django app to Fly.io.
Python
3
star
89

localai-demo

LocalAI demo app on Fly.io
Shell
3
star
90

rails-statics

Rails application to test the performance of Rails static assets
Ruby
3
star
91

hello-django-postgres

Python
2
star
92

fly-laravel-litefs

Guide on deploying multiple instances of a Laravel Fly app and integrating LiteFS and fly-replay to allow syncing SQLite database across the instances.
PHP
2
star
93

ollama-webui-demo

Shell
2
star
94

udp-echo-

Sample TCP/UDP Echo Service
Go
2
star
95

replicache-websocket

TypeScript
2
star
96

rails-machine-workers

A demonstration of how to use Fly Machines for "scale-to-0" ActiveJob background workers
Ruby
2
star
97

flygreeting

An example app for other examples to use.
Go
2
star
98

postgres-standalone

Standalone Postgres on Fly
Go
2
star
99

flychat-ws

A chat example using raw websockets, tested on Fly.io
JavaScript
2
star
100

fly-lucky

Crystal Lucky Framework app for Fly.io deployment
Crystal
2
star