Unsupervised Learning Through Clustering
Learn how to identify groups in data using an unsupervised ML technique called clustering.
Pattern identification—revisited
Let’s look at an interesting possibility. Suppose you have been hired as an ML engineer at Meta, and you have been tasked to identify the social communities of the users of the Facebook app. The goal is to find groups of users who share similar interests and are connected as friends on the social network. You have decided to use one of the pattern identification methods and the ML process learned previously.
Let’s look at the data plot of Facebook users across the city. Do you see any patterns in the data? Can you identify groups of users that may form the same community based on their mutual friends and the content they like?
We can indeed identify some closer groups of points in the data. This means we can solve this problem of finding communities in data using our ML process.
Data without labels
Let’s look at the data table for the above plot. It has two features, just like we had for the movie dataset.
Number of mutual friends
Number of similar pages liked
Well, that makes it an easy problem to solve! We have already used an MLP to classify the movie data set into two categories. This would probably need more classes, but classification using an MLP clearly looks like the most intuitive way to solve the problem.
However, there’s one crucial piece of information missing in the data.
Classifying Facebook users
What is the missing piece of information in the data table above that is essential for using classification with an MLP to find Facebook user communities?
It does not have enough features.
It does not have class labels with each user’s data.
This data is missing the labels or desired output values against every data point. For a supervised learning technique such as classification using an MLP, we need to have the output labels column in the dataset.
In this class of ML problems, we cannot supervise the learning process. Therefore, we need an unsupervised learning approach to find patterns in the data.
Finding patterns in unlabeled data
Since our dataset does not have labels, we can’t use the MLP model for classification; we need to find communities in the Facebook user’s data using some alternate method.
Facebook communities are groups of users who share similar interests and have similar friends and connections.
So, for two users to belong to the same community, they should have some mutual friends and share similar interests and hobbies. If we look at the data plot again, we can notice ...