Simple Linear Regression
This lesson will focus on what linear regression is and why we need it.
We'll cover the following...
During the last stage of the data science lifecycle, we are faced with the question of which model to choose to make predictions. We can decide what kind of model to use by looking at the relationship between the data variables that we have.
Let’s take the example of predicting tips paid to waiters.
import pandas as pdimport matplotlib.pyplot as pltdf = pd.read_csv('tips.csv')print(df.describe())# plot between total_bill and tipdf.plot(kind='scatter',x = 'total_bill',y='tip')plt.show()
From the plot, we can see that there is a direct linear relationship between total_bill
and tips
where increasing/decreasing total_bill
results in an increase/decrease in tips
as well. Looking at the linear relationship, we need a linear model for this problem. If we represent total_bill
values by then our prediction () becomes
This is the form of a linear equation. The term implies that increasing/decreasing total bill(x) will increase/decrease the tip(). Now that we have our model, we need to fit it to the data using gradient descent optimization. We will be using the mean squared error loss function.
Optimizing the Simple linear model
If we denote our predictions by and the actual values by , then loss function will be calculated as:
We will be minimizing this loss function with gradient descent. Recall that in gradient descent we:
- Start with random initial value of .
- Compute