Getting Started with State Persistance

How to Persist State?

Having fault-tolerance and high-availability is of no use if we lose application state during rescheduling.

Having state is unavoidable, and we need to preserve it no matter what happens to our applications, servers, or even a whole datacenter.

The way to preserve the state of our applications depends on their architecture. Some are storing data in-memory and rely on periodic backups. Others are capable of synchronizing data between multiple replicas, so that loss instance of one does not result in loss of data. Most, however, are relying on disk to store their state. We’ll focus on that group of stateful applications.

If we are to build fault-tolerant systems, we need to make sure that failure of any part of the system is recoverable. Since speed is of the essence, we cannot rely on manual operations to recuperate from failures. Even if we could, no one wants to be the person sitting in front of a screen, waiting for something to fail, only to bring it back to its previous state.

Kubernetes Failure Handling

We already saw that Kubernetes would, in most cases, recuperate from a failure of an application, of a server, or even of a whole datacenter. It’ll reschedule Pods to healthy nodes. We also experienced how AWS and kops accomplish more or less the same effect on the infrastructure level. Auto-scaling groups will recreate failed nodes and, since they are provisioned with kops startup processes, new instances will have everything they need, and they will join the cluster.

The only thing that prevents us from saying that our system is (mostly) highly available and fault tolerant is the fact that we did not solve the problem of persisting state across failures. That’s the subject we’ll explore next.

We’ll try to preserve our data no matter what happens to our stateful applications or the servers where they run.

ℹ️ All the commands from this chapter are available in the 15-pv.sh Gist.

Creating A Kubernetes Cluster

We’ll start by recreating a similar cluster as the one we used in the previous chapter.

Get hands-on with 1400+ tech skills courses.