What Do We Expect from Deployments?
In this lesson we learn about the features of a good deployment strategy.
Before we dive into some of the deployment strategies, we might want to set some expectations that will guide us through our choices. But, before we do that, let’s try to define what a deployment is.
What is deployment?
Traditionally, a deployment is a process through which we install new applications into our servers or update those that are already running with new releases.
That was, more or less, what we were doing from the beginning of the history of our industry, and that is, in essence, what we’re doing today. But, as we’ve evolved, our requirements have been changing as well. Today, to say that all we expect is for our releases to run is an understatement. Today we want so much more, and we have technology that can help us fulfill those desires. So, what does much more mean today?
Requirements and features to expect from deployment
Depending on who you speak with, you will get a different list of desires. What follows is what I believe to be essential, and what I observed the companies I worked for emphasizing. Without further ado, the requirements, excluding the obvious that applications should be running inside the cluster, are as follows.
Fault-tolerance
Applications should be fault-tolerant. If an instance of the application dies, it should be brought back up. If the node where an application is running dies, the application should be moved to a healthy node. Even if a whole data center goes down, the system should be able to move the applications that were running there into a healthy one. An alternative would be to recreate the failed nodes or data centers with precisely the same apps that were running there before the outage. However, that is too slow and, frankly speaking, we moved away from that concept the moment we adopted schedulers. That doesn’t mean that failed nodes and failed data centers should not recuperate, but rather that we should not wait for infrastructure to get back to normal. Instead, we should run failed applications, no matter the cause, on healthy nodes as long as there is enough available capacity.
Fault ...