Understanding Clustering

Learn the basics of clustering using sklearn.

Creating a KMeans cluster

So, what exactly is clustering and when might it be helpful? Let’s start with a very simple example. Imagine we have a group of people for whom we want to make T-shirts. We can make a T-shirt for each one of them in whatever size is required. The main restriction is that we can only make one size. The sizes are as follows: [1, 2, 3, 4, 5, 7, 9, 11]. Think about how we might tackle this problem. We’ll use the KMeans algorithm for that, so let’s start right away, as follows:

Press + to interact
import numpy as np
from sklearn.cluster import KMeans
sizes = np.array([1, 2, 3, 4, 5, 7, 9, 11]).reshape(-1,1)
kmeans1 = KMeans(n_clusters=1)
kmeans1.fit(sizes)
KMeans(n_clusters=1)
  • Lines 1–2: We import the required packages and models. NumPy will be imported as a package, but from sklearn, we’ll import the only model that we will be using for now.

  • Line 3: We create a dataset of sizes in the required format. Note that each observation (person’s size) should be represented as a list, so we use the reshape method of NumPy arrays to get the data in the required format.

  • Line 4: We create an instance of the KMeans model with the required number of clusters. An important feature of this model is that we provide the desired number of clusters for it. In this case, we were given a constraint, ...