Classification and clustering are techniques used in data mining to analyze collected data. Classification is used to label data, while clustering is used to group similar data instances together.
Let’s explore the major differences between classification and clustering:
The number of classes is known.
Training data (collection of labeled instances) is required.
Based on the training data, the classification model is used to classify future instances into already defined classes.
Popular algorithms for classification include Naive Bayes Classifier, Decision Trees, and Random Forests.
The number of classes is unknown.
No training data is required.
Clustering is used to make sense of existing data.
Popular algorithms used for clustering include K-Means, Mean-Shift Clustering, and Density-Based Spatial Clustering of Applications with Noise.