The following steps demonstrate the process of training and visualizing linear and polynomial regression models using the provided dataset.
In the first step, we import the necessary libraries.
import numpy as npimport matplotlib.pyplot as pltimport pandas as pd
After importing libraries, we load the dataset from a CSV file.
dataset = pd.read_csv('Data.csv')x = dataset.iloc[:, 1:2].valuesy = dataset.iloc[:, 2].values
Here, we use the iloc()
function in python to assign the variables x
and y
the values of feature variable and the values of the target variable respectively from the dataset.
In this step, we train the linear regression model on the entire dataset.
from sklearn.linear_model import LinearRegressionreg = LinearRegression()reg.fit(x,y)
Here we train the polynomial regression model on the entire dataset.
from sklearn.preprocessing import PolynomialFeaturespoly = PolynomialFeatures(degree = 4)x_poly = poly.fit_transform(x)reg2 = LinearRegression()reg2.fit(x_poly,y)
After training the model, we visualize the linear regression results by creating a scatter plot of the actual data points and then plotting the regression line using reg.predict(x)
to predict y
based on x
.
plt.scatter(x, y, color= 'cadetblue')plt.plot(x, reg.predict(x), color = 'gray')plt.title('Linear regression')plt.xlabel('Position level')plt.ylabel('Salary')plt.show()
Here, we create a more detailed visualization of the polynomial regression results by generating a range of values based on the minimum and maximum values of x
for higher resolution and smoother curves in the plot.
visual = np.arange(min(x), max(x), 0.1)visual = visual.reshape((len(visual), 1))plt.scatter(x, y, color= 'cadetblue')plt.plot(visual, reg2.predict(poly.fit_transform(visual)), color = 'gray')plt.title('Polynomial regression')plt.xlabel('position level')plt.ylabel('salary')plt.show()
In this step, with the reg.predict()
method, we predict with the trained linear regression model. It predicts the salary for a new position level of 6.5
.
reg.predict([[6.5]])
Similarly, we predict with the trained polynomial regression model. It predicts the salary for a new position level of 6.5
using the polynomial features.
reg2.predict(poly.fit_transform([[6.5]]))
# Importing the librariesimport numpy as npimport matplotlib.pyplot as pltimport pandas as pd# Importing the datasetdataset = pd.read_csv('Data.csv')x = dataset.iloc[:, 1:2].valuesy = dataset.iloc[:, 2].values#Fitting linear regression to the datasetfrom sklearn.linear_model import LinearRegressionreg = LinearRegression()reg.fit(x,y)#Fitting polynomial regression to the datasetfrom sklearn.preprocessing import PolynomialFeaturespoly = PolynomialFeatures(degree = 4)x_poly = poly.fit_transform(x)reg2 = LinearRegression()reg2.fit(x_poly,y)#Visualising the linear regression resultsplt.scatter(x, y, color= 'cadetblue')plt.plot(x, reg.predict(x), color = 'gray')plt.title('Linear regression')plt.xlabel('Position level')plt.ylabel('Salary')plt.savefig('output/linear.png')plt.show()plt.clf()#Visualising the polynomial regression resultsvisual = np.arange(min(x), max(x), 0.1)visual = visual.reshape((len(visual), 1))plt.scatter(x, y, color= 'cadetblue')plt.plot(visual, reg2.predict(poly.fit_transform(visual)), color = 'gray')plt.title('Polynomial regression')plt.xlabel('Position level')plt.ylabel('Salary')plt.savefig('output/polynomial.png')plt.show()# Predicting a new result with linear regressionreg.predict([[6.5]])# Predicting a new result with polynomial regressionreg2.predict(poly.fit_transform([[6.5]]))
We create four different polynomial regression models with increasing complexity by modifying the degree parameter in the PolynomialFeatures
constructor to 2, 3, 4, and 5. By comparing the results of these models, we can evaluate how different degrees of polynomials capture the patterns in the data.
As the degree of the polynomial increases in polynomial regression models, the models become more flexible and capable of fitting complex patterns in the data. Higher degree polynomials can capture complex relationships between the independent and dependent variables. However, increasing the degree can also lead to overfitting, where the model becomes too sensitive to the training data and performs poorly on new, unseen data.
Free Resources