Unsupervised Learning Through Clustering

Learn how to identify groups in data using an unsupervised ML technique called clustering.

We'll cover the following...

Pattern identification—revisited
Data without labels
Finding patterns in unlabeled data
- Similarity score
Learning without supervision
- The K-means algorithm
K-means using sklearn

Pattern identification—revisited

Let’s look at an interesting possibility. Suppose you have been hired as an ML engineer at Meta, and you have been tasked to identify the social communities of the users of the Facebook app. The goal is to find groups of users who share similar interests and are connected as friends on the social network. You have decided to use one of the pattern identification methods and the ML process learned previously.

Let’s look at the data plot of Facebook users across the city. Do you see any patterns in the data? Can you identify groups of users that may form the same community based on their mutual friends and the content they like?

Press + to interact

This data is missing the labels or desired output values against every data point. For a supervised learning technique such as classification using an MLP, we need to have the output labels column in the dataset.

In this class of ML problems, we cannot supervise the learning process. Therefore, we need an unsupervised learning approach to find patterns in the data.

Finding patterns in unlabeled data

Since our dataset does not have labels, we can’t use the MLP model for classification; we need to find communities in the Facebook user’s data using some alternate method.

Facebook communities are groups of users who share similar interests and have similar friends and connections.

So, for two users to belong to the same community, they should have some mutual friends and share similar interests and hobbies. If we look at the data plot again, we can notice that some users are closer together in space than others. It shows that they have a similar number of mutual friends and have liked the same Facebook pages, which shows they have common activities.

If we are able to find a similarity score of each Facebook user with every other user in the dataset, we may be able to identify similar groups in the data.

Similarity score

The simplest method to find out if one user is similar to the other is to calculate a score that shows whether the two are very similar, i.e., they are in the same friend groups and share hobbies or not. This score can be calculated from the feature values given in the data. We can calculate how similar each pair of users is if we can find out how close or distant they are in 2-D space.

Press + to interact

The Machine Learning Problem

The Machine Learning Process

From a Single Neuron to Artificial Neural Networks

Code for Machine Learning Using scikit-learn

Concluding Thoughts

How to Predict the Traffic Volume Using Machine Learning

Unsupervised Learning Through Clustering

Pattern identification—revisited

Data without labels

Finding patterns in unlabeled data

Similarity score