...

/

Simple Linear Regression

Simple Linear Regression

This lesson will focus on what linear regression is and why we need it.

During the last stage of the data science lifecycle, we are faced with the question of which model to choose to make predictions. We can decide what kind of model to use by looking at the relationship between the data variables that we have.

Let’s take the example of predicting tips paid to waiters.

Press + to interact
import pandas as pd
import matplotlib.pyplot as plt
df = pd.read_csv('tips.csv')
print(df.describe())
# plot between total_bill and tip
df.plot(kind='scatter',x = 'total_bill',y='tip')
plt.show()

From the plot, we can see that there is a direct linear relationship between total_bill and tips where increasing/decreasing total_bill results in an increase/decrease in tips as well. Looking at the linear relationship, we need a linear model for this problem. If we represent total_bill values by xx then our prediction (y^\hat{y}) becomes

y^=θ0x+θ1\hat{y} = \theta_0x + \theta_1

This is the form of a linear equation. The term θ1x\theta_1x implies that increasing/decreasing total bill(x) will increase/decrease the tip(y^\hat{y}). Now that we have our model, we need to fit it to the data using gradient descent optimization. We will be using the mean squared error loss function.

Optimizing the Simple linear model

If we denote our predictions by y^\hat{y} and the actual values by yy, then loss function will be calculated as:

L(θ,X,Y)=1ni=1n(yiyi^)2L(\theta,X,Y) = \frac{1}{n}\sum_{i=1}^{n}{( y_i - \hat{y_i})^2}

L(θ,X,Y)=1ni=1n(yi(θ0x+θ1))2L(\theta,X,Y) = \frac{1}{n}\sum_{i=1}^{n}{( y_i - (\theta_0x + \theta_1 ))^2}

We will be minimizing this loss function with gradient descent. Recall that in gradient descent we:

  • Start with random initial value of θ\theta.
  • Compute θtαθL(θ,X,Y)\theta_t - \alpha \frac{\partial}{\partial \theta} L(\theta,X,Y)
...
Access this course and 1400+ top-rated courses and projects.