Introduction
We are going to turn on three CoreOS VMs under vagrant and set them under various configurations to show off different failure domains of Kubernetes and how to handle them in production.
git clone https://github.com/coreos/coreos-vagrant
cd coreos-vagrant
git clone https://github.com/philips/real-world-kubernetes
sed -e 's%num_instances=1%num_instances=3%g' < config.rb.sample > config.rb
NOTE: please use the latest version of CoreOS alpha box
vagrant box update --box coreos-alpha
vagrant box update --box coreos-alpha --provider vmware_fusion
Now lets startup the hosts
vagrant up
vagrant status
And configure ssh to talk to the new vagrant hosts correctly:
vagrant ssh-config > ssh-config
alias ssh="ssh -F ssh-config"
alias scp="scp -F ssh-config"
This should show three healthy CoreOS hosts launched.
etcd clustering
For etcd we are going to scale the cluster from a single machine up to a three machine cluster. Then we will fail a machine and show everything is still working.
Single Machine
Setup an etcd cluster with a single machine on core-01. This is as easy as starting the etcd2 service on CoreOS.
vagrant up
vagrant ssh-config > ssh-config
vagrant ssh core-01
sudo systemctl start etcd2
systemctl status etcd2
Confirm that you can write into etcd now:
etcdctl set kubernetes rocks
etcdctl get kubernetes
Now, we can confirm the cluster configuration with the etcdctl member list
subcommand.
etcdctl member list
By default etcd will listen on localhost and not advertise a public address. We need to fix this before adding additional members. First, tell the cluster the members new address. The IP is the default IP for core-01 in coreos-vagrant.
etcdctl member update ce2a822cea30bfca http://172.17.8.101:2380
Updated member with ID ce2a822cea30bfca in cluster
Let's reconfigure etcd to listen on public ports to get it ready to cluster.
sudo su
sudo mkdir /etc/systemd/system/etcd2.service.d/
cat <<EOM > /etc/systemd/system/etcd2.service.d/10-listen.conf
[Service]
Environment=ETCD_NAME=core-01
Environment=ETCD_ADVERTISE_CLIENT_URLS=http://172.17.8.101:2379
Environment=ETCD_LISTEN_CLIENT_URLS=http://0.0.0.0:2379
Environment=ETCD_LISTEN_PEER_URLS=http://0.0.0.0:2380
EOM
All thats left is to restart etcd and the reconfiguration should be complete.
sudo systemctl daemon-reload
sudo systemctl restart etcd2
etcdctl get kubernetes
Add core-02 to the Cluster
Now that core-01 is ready for clustering lets add our first additional cluster member, core-02.
vagrant ssh core-02
etcdctl --peers http://172.17.8.101:2379 set /foobar baz
Login to core-02 and lets add it to the cluster:
etcdctl --peers http://172.17.8.101:2379 member add core-02 http://172.17.8.102:2380
The above command will dump out a bunch of initial configuration information. Next, we will put that configuration information into the systemd unit file for this member:
sudo su
mkdir /etc/systemd/system/etcd2.service.d/
cat <<EOM > /etc/systemd/system/etcd2.service.d/10-listen.conf
[Service]
Environment=ETCD_NAME=core-02
Environment=ETCD_INITIAL_CLUSTER=core-01=http://172.17.8.101:2380,core-02=http://172.17.8.102:2380
Environment=ETCD_INITIAL_CLUSTER_STATE=existing
Environment=ETCD_ADVERTISE_CLIENT_URLS=http://172.17.8.102:2379
Environment=ETCD_LISTEN_CLIENT_URLS=http://0.0.0.0:2379
Environment=ETCD_LISTEN_PEER_URLS=http://0.0.0.0:2380
EOM
sudo systemctl daemon-reload
sudo systemctl restart etcd2
etcdctl member list
Now, at this point the cluster is in an unsafe configuration. If either machine fails etcd will stop working.
sudo systemctl stop etcd2
exit
vagrant ssh core-01
sudo etcdctl set kubernetes bad
sudo etcdctl get kubernetes
exit
vagrant ssh core-02
sudo systemctl start etcd2
sudo etcdctl set kubernetes awesome
Add core-03 to the Cluster
To get out of this unsafe configuration lets add a third member. After adding this third member this cluster will be able to survive single machine failures.
vagrant ssh core-03
etcdctl --peers http://172.17.8.101:2379 member add core-03 http://172.17.8.103:2380
sudo su
mkdir /etc/systemd/system/etcd2.service.d/
cat <<EOM > /etc/systemd/system/etcd2.service.d/10-listen.conf
[Service]
Environment=ETCD_NAME=core-03
Environment=ETCD_INITIAL_CLUSTER=core-01=http://172.17.8.101:2380,core-02=http://172.17.8.102:2380,core-03=http://172.17.8.103:2380
Environment=ETCD_INITIAL_CLUSTER_STATE=existing
Environment=ETCD_ADVERTISE_CLIENT_URLS=http://172.17.8.103:2379
Environment=ETCD_LISTEN_CLIENT_URLS=http://0.0.0.0:2379
Environment=ETCD_LISTEN_PEER_URLS=http://0.0.0.0:2380
EOM
sudo systemctl daemon-reload
sudo systemctl restart etcd2
etcdctl member list
Surviving Machine Failure
Now we can have a single machine fail, like core-01, and have the cluster configure to set and retrieve values.
vagrant destroy core-01
vagrant ssh core-02
etcdctl set foobar asdf
Automatic Bootstrapping
This exercise was designed to get you comfortable with etcd bringup and reconfiguration. In environments where you have deterministic IP address you can use static cluster bringup. In environments with dynamic IPs you can use an etcd discovery.
Cleanup
For the next exercise we are only going to use a single member etcd cluster. Lets destroy the machines and bring up clean hosts:
vagrant destroy -f core-01
vagrant destroy -f core-02
vagrant destroy -f core-03
Disaster Recovery of etcd
In most all environments etcd will be replicated. But, etcd is generally holding onto critical data so you should plan for backups and disaster recovery. This example will cover restoring etcd from backup.
Start a Cluster and Destroy
Bring up the cluster of three machines
vagrant up
vagrant ssh-config > ssh-config
Startup a single machine etcd cluster on core-01 and launch a process that will write a key now every 5 seconds with the current date.
ssh core-02
systemctl start etcd2
sudo systemd-run /bin/sh -c 'while true; do etcdctl set now "$(date)"; sleep 5; done'
exit
Backup the etcd cluster state to a tar file and save it on the local filesystem. In a production cluster this could be done with a tool like rclone in a container to save it to an object store or another server.
ssh core-02 sudo tar cfz - /var/lib/etcd2 > backup.tar.gz
ssh core-02 etcdctl get now
vagrant destroy -f core-02
vagrant up core-02
vagrant ssh-config > ssh-config
Restore from Backup
First, lets restore the data from the etcd member to the new host location:
scp backup.tar.gz core-01:
ssh core-01
tar xzvf backup.tar.gz
sudo su
mv var/lib/etcd2/member /var/lib/etcd2/
chown -R etcd /var/lib/etcd2
Next we need to tell etcd to start but to only use the data, not the cluster configuration. We do this by setting a flag called FORCE_NEW_CLUSTER. This is something like "single user mode" on a Linux host.
mkdir -p /run/systemd/system/etcd2.service.d
cat <<EOM > /run/systemd/system/etcd2.service.d/10-new-cluster.conf
[Service]
Environment=ETCD_FORCE_NEW_CLUSTER=1
EOM
systemctl daemon-reload
systemctl restart etcd2
To ensure we don't accidently reset the cluster configuration in the future, remove the force new cluster option and flush it from systemd.
rm /run/systemd/system/etcd2.service.d/10-new-cluster.conf
systemctl daemon-reload
Now, we should have our database fully recovered with the application data intact. From here we can rebuild the cluster using the methods from the first section.
etcdctl member list
etcdctl get now
Cleanup
vagrant destroy -f core-01
vagrant destroy -f core-02
vagrant destroy -f core-03
Securing etcd
Now that we have good practice with cluster operations of etcd under network partition, adding/removing members, and backups lets add transport security to the machine that will act as our etcd machine: core-01.
vagrant up
vagrant ssh-config > ssh-config
Generate Certificate Authority
First, lets generate a certificate authority and some certificates signed by that authority. You can take a look at the makefile but it essentially using the cfssl tool to generate a CA and an etcd cert signed by that CA.
pushd real-world-kubernetes/tls-setup
make install-cfssl
make
popd
Now drop the certs onto the host:
scp -r real-world-kubernetes/tls-setup/certs core-01:
Use CA with etcd
Install the newly generated certificates onto the host. In a real-world environment this would be done with a cloud-config or installed on first boot.
ssh core-01
sudo su
mkdir /etc/etcd
mv certs/etcd* /etc/etcd
chown -R etcd: /etc/etcd
cp certs/ca.pem /etc/ssl/certs/
/usr/bin/c_rehash
exit
exit
Finally, prepare etcd to use a certificate and key file that are dropped onto the host.
ssh core-01
sudo su
mkdir /etc/systemd/system/etcd2.service.d/
cat <<EOM > /etc/systemd/system/etcd2.service.d/10-listen.conf
[Service]
Environment=ETCD_NAME=core-01
Environment=ETCD_ADVERTISE_CLIENT_URLS=https://core-01:2379
Environment=ETCD_LISTEN_CLIENT_URLS=https://0.0.0.0:2379
Environment=ETCD_CERT_FILE=/etc/etcd/etcd.pem
Environment=ETCD_KEY_FILE=/etc/etcd/etcd-key.pem
EOM
systemctl daemon-reload
systemctl restart etcd2
exit
exit
Test with etcdctl
With everything in place we should be able to set a key over a secure connection:
ssh core-01
etcdctl --peers https://core-01:2379 --ca-file certs/ca.pem set kubernetes is-ready
Running Kubernetes API Server
Now that we fully understand etcd and how to operate it securely and in clusters lets bringup a Kubernetes API server.
CONTROLLER=core-01
Get the basic configuration files in place on the server.
scp -r real-world-kubernetes/k8s-setup ${CONTROLLER}:
scp -r real-world-kubernetes/k8s-srv-setup ${CONTROLLER}:
Then copy them over to the right locations and restart the kubelet to have it bootstrap the API server.
ssh ${CONTROLLER}
sudo su
mkdir -p /etc/kubernetes/ssl/
cp certs/ca.pem /etc/kubernetes/ssl/
cp certs/apiserver* /etc/kubernetes/ssl/
cp k8s-setup/kubelet.service /etc/systemd/system/kubelet.service
mkdir -p /etc/kubernetes/manifests/
cp k8s-setup/kube-*.yaml /etc/kubernetes/manifests/
mkdir -p /srv/kubernetes/manifests
cp k8s-srv-setup/*.yaml /srv/kubernetes/manifests
systemctl daemon-reload
systemctl restart kubelet.service
systemctl enable kubelet
exit
exit
Test API
At this point the API should be up and available. But, we need a DNS entry to point at; so lets set that up first on our workstations. NOTE: this IP will change based on host configuration.
export CORE_01_IP=$(cat ssh-config | grep HostName | awk '{print $2}' | head -n1)
sudo -E /bin/sh -c 'echo "${CORE_01_IP} core-01" >> /etc/hosts'
With DNS configured we can try kubectl
with our pre-made configuration file:
export KUBECONFIG=real-world-kubernetes/kubeconfig
kubectl get pods
If all goes well we shoudl get an empty list of pods! Now, lets add some worker nodes.
Kubernetes API Server Under etcd Failure
Temporary Partition
Lets start a really boring job on the cluster that just sleeps forever. This goes through just fine:
kubectl run pause --image=gcr.io/google_containers/pause
Next, we will stop etcd simulating a partition:
ssh core-01 sudo systemctl stop etcd2
Attempting to do any API call to the server is going to fail blocked on etcd. This behavior is identical to if you had a web service and stopped its SQL database.
kubectl describe rc pause
Lets start up etcd and get things going:
ssh core-01 sudo systemctl start etcd2
After a few seconds the API server should start responding and we should be able to get the status of our replication controller:
kubectl describe rc pause
Data-loss and Restore
Lets run a really boring application in this cluster with no nodes:
kubectl run pause --image=gcr.io/google_containers/pause
And take a quick backup of etcd:
ssh core-01 sudo tar cfz - /var/lib/etcd2 > backup.tar.gz
kubectl scale rc pause --replicas=5
kubectl describe rc pause
ssh core-01 sudo systemctl stop etcd2
ssh core-01
sudo su
mkdir tmp
mv /etc/kubernetes/manifests/kube-* tmp/
rm -Rf /var/lib/etcd2/*
exit
docker ps
exit
scp backup.tar.gz core-01:
ssh core-01
tar xzvf backup.tar.gz
sudo su
mv var/lib/etcd2/member /var/lib/etcd2/
chown -R etcd /var/lib/etcd2
systemctl start etcd2.service
mv tmp/* /etc/kubernetes/manifests
exit
etcdctl --peers https://core-01:2379 --ca-file certs/ca.pem set kubernetes is-ready
exit
kubectl describe rc pause
kubectl scale rc pause --replicas=1
Kubernetes Workers
Setup Workers
Lets setup core-02 and core-03 as the worker machines.
WORKER=core-02
scp -r real-world-kubernetes/worker-setup ${WORKER}:
scp -r real-world-kubernetes/tls-setup/certs ${WORKER}:
ssh ${WORKER}
sudo su
mkdir -p /etc/kubernetes/ssl/ /etc/kubernetes/manifests
cp certs/ca.pem /etc/kubernetes/ssl/
cp certs/worker* /etc/kubernetes/ssl/
cp worker-setup/kubelet.service /etc/systemd/system/kubelet.service
cp worker-setup/kube-*.yaml /etc/kubernetes/manifests/
cp worker-setup/worker-kubeconfig.yaml /etc/kubernetes
systemctl daemon-reload
systemctl restart kubelet.service
systemctl enable kubelet
exit
exit
Now re-run the above after changing the worker variable to setup core-03:
WORKER=core-03
At this point we should see two machines listed in the set of machines
kubectl get nodes
Individual Worker Failure
kubectl describe rc pause
kubectl scale rc pause --replicas=10
vagrant halt core-01
Now after about one minute we should notice everything has been moved off of core-03. Why? It is because of a thing called the
kubectl describe node core-03 core-02
High Availability of API Server
This is easy, the API server is trivially horizontally scalable.
Set the controller to core-02 and re-run the controller provisioning steps from before against this host.
CONTROLLER=core-02
Now test out that the kubernetes services are running.
kubectl -s https://core-02 get pods
High-Availability of Scheduler/Controller-Manager
There is a service called the pod master that moves things in and out of the kubelet's manifest directory based on compare and swap operations in etcd.
Let's force a master election of the scheduler by removing all of the control pieces, including the pod master out of the host.
ssh core-01
sudo su
mkdir tmp
mv /etc/kubernetes/manifests/* tmp
After a few seconds we should see that the scheduler and controller manager get master elected over to
ssh core-02
sudo su
ls -la /etc/kubernetes/manifests
With this mechanism in place and an HA etcd cluster you can rest easy knowing the control plane won't go down in the face of single machine failure.
Cleanup
Upgrading Control Cluster
Now upgrading under the control plane between minor versions is something you might realistically do. However, these sorts of scenarios aren't really tested.