Regression and classification are two techniques used when designing machine learning algorithms. Both regression machine learning algorithms and classification machine learning algorithms are classified under the realm of supervised machine learning.
Supervised machine learning occurs when a model is trained on existing data that is correctly labeled.
The key difference between classification and regression is that classification predicts a discrete label, while regression predicts a continuous quantity or value.
Let’s consider regression and classification individually:
Regression is the process of finding a model that predicts a continuous value based on its input variables. In regression problems, the goal is to mathematically estimate a mapping function from the input variables to the output variables.
Consider a dataset that contains information about all the students in a university. An example of a regression task would be to predict the height of any student based on their gender, weight, major, and diet. We can do this because height is a continuous quantity; i.e., there are an infinite amount of possible values for a person’s height.
A regression algorithm is commonly evaluated by calculating the root mean squared error of its output.
On the other hand, classification is the process of finding a model that separates input data into multiple discrete classes or labels. In other words, a classification problem determines whether or not an input value can be part of a pre-identified group.
Consider the same dataset of all the students at a university. A classification task would be to use parameters, such as a student’s weight, major, and diet, to determine whether they fall into the “Above Average” or “Below Average” category. Note that there are only two discrete labels in which the data is classified.
A classification algorithm is evaluated by computing the accuracy with which it correctly classified its input.