Supervised learning

Supervised learning refers to inferring a function from a labeled data set. We have been given data as features, and we have to predict the output. This output can be from the predefined set of values (like predicting the animal or type of vehicle) or from a continuous range (like predicting stock price or the temperature during the day). These types of problems utilize the approach of supervised learning, which uses the concepts of regression.

Regression

In regression, data is given in the form of features. We need to predict the output y with the help of the given data. Regression is the process of learning the relationship between input and output. Input is referred to as X, and output is referred to as y.

Example of regression tasks:

Predicting the salary of an employee based on city, years of experience, and education level.
Predicting the number of new students’ involvement after 2 years of taking a course.
Predicting YouTube viewers of new song videos based on singer popularity, country, number of advertisements, audio/video quality, etc.
Predicting disease criticalness using test measures.
Predicting an outbreak spread within given environmental conditions.

To solve any machine learning task, we need data. For example, the first problem of predicting the salary, we need salary data to build the model. Our data may available in this format:

City	Years of Experience	Knowledge Level	Salary
New York	6	Intermediate	$130K
Berlin	8	High	$118K
Mumbai	2	Beginner	$90K
Tokyo	12	High	$145K

Above is the training data. We use this to create the regression model. After completion, our model is ready to take data points (city, years of experience, and education level) to predict salaries.

Simple regression

Simple regression is defined as predicting an output y with the help of single input x. It is useful for finding the relationship between two continuous variables. Consider the above problem. We have taken one feature, years of experience, and we want to predict the annual salary of a person based on that feature.

Input: Years of experience	Output: Salary in $
6	130000
8	118000
2	90000
12	145000

Data can be assumed in this way:

We have to find a relationship function between x and y like this:

This is not always 100% correct. Our true output may vary from the predicted output, which is an error in the system.

So, the regression model is defined as:

$y = f(X) + Error$

Quiz: Regression

What is the expected value of error in the regression model?

-1

Mean of the data

Solution:

The expected value of error is the average of all the values. Some are positive and some are negative. Summing up all the values will lead to zero. This means that it is equally likely that error may be positive or negative. Our predicted value can go above or below the predicted function line.

The simple regression model is defined as:

$y_i = w_0 + w_1 + x_i + Error_i$

Are You Ready to Become a Data Scientist?

Python Basics

Python Libraries

More Data Science Tools

Data Structures and Algorithms - I

Data Structures and Algorithms - II

Statistics and Probability

Feature Engineering

Basics of Machine Learning

Regression

Classification

Unsupervised Learning

Advanced Topics in Machine Learning

Conclusion

Simple Regression

Supervised learning

Regression

Simple regression