Creating a Cluster: Discussing the Specifications

Understand the specifications for our cluster.

We are finally ready to create a cluster. But before we do that, we’ll spend some time discussing the requirements we might have. After all, not all clusters are created equal, and the choices we are about to make might severely impact our ability to accomplish the goals we might have.

Primary nodes

The first question we might ask ourselves is whether we want high availability. It would be strange if anyone answers no. Who doesn’t want to have a cluster that is (almost) always available? Instead, we’ll ask ourselves what things might bring our cluster down.

When a node is destroyed, Kubernetes will reschedule all the applications that were running inside it to healthy nodes. All we have to do is to make sure that, later on, a new server is created and joined to the cluster so that its capacity is back to the desired values. We’ll discuss later how new nodes are created as a reaction to failures of a server. For now, we’ll assume that will happen somehow.

Still, there is a catch. Given that new nodes need to join the cluster, if the primary server fails, there is no cluster to join. All is lost. The most important part is where the primary servers are. They host the critical components without which Kubernetes cannot operate.

So, we need more than one primary node. How about two? If one fails, we still have the other one. Still, that would not work.

Every piece of information that enters one of the primary nodes is propagated to the others, and only after the majority agrees is that information committed. If we lose the majority (50%+1), primary nodes cannot establish a quorum and will cease to operate. If one out of two primary nodes is down, we can get only half of the votes, and we would lose the ability to establish the quorum. Therefore, we need three primary nodes or more. Odd numbers greater than one are “magic” numbers. Given that we won’t create a huge cluster, three should do.

With three primary nodes, we are safe from the failure of any single one of them. Considering that failed servers will be replaced with new ones as long as only one primary node fails at the time, we should be fault- tolerant and have high availability.

Note: Always set an odd number greater than one for primary nodes.

Data centers

The idea of having multiple primary nodes does not mean much if an entire data center goes down. Attempts to prevent a data center from failing are commendable. Still, no matter how well a data center is designed, there is always a scenario that might cause it to crash. So, we need more than one data center. Following the logic behind primary nodes, we need at least three. But, as with almost anything else, we cannot have three (or more) data centers. If they are too far apart, the latency between them might be too high. Since every piece of information is propagated to all the primary nodes in a cluster, slow communication between data centers would severely impact the cluster as a whole.

All in all, we need three data centers that are close enough to provide low latency, yet physically separated so that failure of one does not impact the others. Since we are about to create the cluster in AWS, we’ll use availability zones (AZs) that are physically separated data centers with low latency.

Note: Always spread your cluster between at least three data centers that are close enough to guarantee low latency.

There’s more to high-availability than running multiple masters and spreading a cluster across multiple availability zones. We’ll get back to this subject later. For now, we’ll continue exploring the other decisions we have to make.

Networking

Which networking shall we use? We can choose any of the following options:

  • kubenet
  • CNI
  • classic
  • external

The classic Kubernetes native networking has been deprecated in favor of kubenet, so we can discard it right away.

The external networking is used in some custom implementations and for particular use cases, so we’ll also discard that one.

That leaves us with kubenet and CNI.

Container Network Interface (CNI) allows us to plug in a third-party networking driver. kOps supports Calico, flannel, Canal (Flannel + Calico), kopeio-vxlan, kube-router, romana, weave, and amazon-vpc-routed-eni. Each of those networks has pros and cons and differs in its implementation and primary objectives. Choosing between them would require a detailed analysis of each. We’ll leave a comparison of all those for some other time and place. Instead, we’ll focus on kubenet.

Kubenet is kOps’ default networking solution. It is a Kubernetes-native networking and is considered battle-tested and very reliable. However, it comes with a limitation. On AWS, routes for each node are configured in AWS VPC routing tables. Since those tables cannot have more than fifty entries, kubenet can be used in clusters with up to fifty nodes. If you’re planning to have a cluster bigger than that, you’ll have to switch to one of the previously mentioned CNIs.

Note: Use kubenet networking if your cluster is smaller than fifty nodes.

The good news is that using any of the networking solutions is easy. All we have to do is specify the --networking argument followed by the name of the network.

Given that we won’t have the time and space to evaluate all the CNIs, we’ll use kubenet as the networking solution for the cluster we’re about to create.

Node size

Finally, we are left with only one more choice we need to make. What will be the size of our nodes? Since we won’t run many applications, t2.small should be more than enough and will keep AWS costs to a minimum. t2.micro is too small, so we will select the second smallest among those AWS offers.

Note: You might have noticed that we did not mention persistent volumes. We’ll explore them in the next chapter.


In the next lesson, we’ll run and verify our cluster.

Get hands-on with 1300+ tech skills courses.