Understanding Clustering
Learn the basics of clustering using sklearn.
Creating a KMeans cluster
So, what exactly is clustering and when might it be helpful? Let’s start with a very simple example. Imagine we have a group of people for whom we want to make T-shirts. We can make a T-shirt for each one of them in whatever size is required. The main restriction is that we can only make one size. The sizes are as follows: [1, 2, 3, 4, 5, 7, 9, 11]. Think about how we might tackle this problem. We’ll use the KMeans
algorithm for that, so let’s start right away, as follows:
import numpy as npfrom sklearn.cluster import KMeanssizes = np.array([1, 2, 3, 4, 5, 7, 9, 11]).reshape(-1,1)kmeans1 = KMeans(n_clusters=1)kmeans1.fit(sizes)KMeans(n_clusters=1)
-
Lines 1–2: We import the required packages and models.
NumPy
will be imported as a package, but fromsklearn
, we’ll import the only model that we will be using for now. -
Line 3: We create a dataset of sizes in the required format. Note that each observation (person’s size) should be represented as a list, so we use the
reshape
method ofNumPy
arrays to get the data in the required format. -
Line 4: We create an instance of the
KMeans
model with the required number of clusters. An important feature of this model is that we provide the desired number of clusters for it. In this case, we were given a constraint, ...