Learn linear regression with hands-on projects
Marvel Comics introduced a fictional character Destiny in the 1980s, with the ability to foresee future occurrences. The exciting news is that predicting future events is no longer just a fantasy! With the progress made in machine learning, a machine can help forecast future events by utilizing the past.
Exciting, right? Let’s start this journey with a simple prediction model. A regression is a mathematical function that defines the relationship between a dependent variable and one or more independent variables. Regression in machine learning analyzes how independent variables or features correlate with a dependent variable or outcome. It serves as a predictive modeling approach in machine learning, where an algorithm predicts continuous outcomes. Rather than delving into theory, the focus will be on creating different models for regression.
Before starting to build a Python regression model, one should examine the data. For instance, if an individual owns a fish farm and needs to predict a fish’s weight based on its dimensions, they can explore the dataset by clicking the “RUN” button to display the top few rows of the DataFrame.
Species Weight V-Length D-Length X-Length Height WidthBream 290 24 26.3 31.2 12.48 4.3056Bream 340 23.9 26.5 31.1 12.3778 4.6961Bream 363 26.3 29 33.5 12.73 4.4555Bream 430 26.5 29 34 12.444 5.134Bream 450 26.8 29.7 34.7 13.6024 4.9274Bream 500 26.8 29.7 34.5 14.1795 5.2785Bream 390 27.6 30 35 12.67 4.69Bream 450 27.6 30 35.1 14.0049 4.8438Bream 500 28.5 30.7 36.2 14.2266 4.9594Bream 475 28.4 31 36.2 14.2628 5.1042Bream 500 28.7 31 36.2 14.3714 4.8146Bream 500 29.1 31.5 36.4 13.7592 4.368Bream 340 29.5 32 37.3 13.9129 5.0728Bream 600 29.4 32 37.2 14.9544 5.1708Bream 600 29.4 32 37.2 15.438 5.58Bream 700 30.4 33 38.3 14.8604 5.2854Bream 700 30.4 33 38.5 14.938 5.1975Bream 610 30.9 33.5 38.6 15.633 5.1338Bream 650 31 33.5 38.7 14.4738 5.7276Bream 575 31.3 34 39.5 15.1285 5.5695Bream 685 31.4 34 39.2 15.9936 5.3704Bream 620 31.5 34.5 39.7 15.5227 5.2801Bream 680 31.8 35 40.6 15.4686 6.1306Bream 700 31.9 35 40.5 16.2405 5.589Bream 725 31.8 35 40.9 16.36 6.0532Bream 720 32 35 40.6 16.3618 6.09Bream 714 32.7 36 41.5 16.517 5.8515Bream 850 32.8 36 41.6 16.8896 6.1984Bream 1000 33.5 37 42.6 18.957 6.603Bream 920 35 38.5 44.1 18.0369 6.3063Bream 955 35 38.5 44 18.084 6.292Bream 925 36.2 39.5 45.3 18.7542 6.7497Bream 975 37.4 41 45.9 18.6354 6.7473Bream 950 38 41 46.5 17.6235 6.3705Roach 40 12.9 14.1 16.2 4.1472 2.268Roach 69 16.5 18.2 20.3 5.2983 2.8217Roach 78 17.5 18.8 21.2 5.5756 2.9044Roach 87 18.2 19.8 22.2 5.6166 3.1746Roach 120 18.6 20 22.2 6.216 3.5742Roach 0 19 20.5 22.8 6.4752 3.3516Roach 110 19.1 20.8 23.1 6.1677 3.3957Roach 120 19.4 21 23.7 6.1146 3.2943Roach 150 20.4 22 24.7 5.8045 3.7544Roach 145 20.5 22 24.3 6.6339 3.5478Roach 160 20.5 22.5 25.3 7.0334 3.8203Roach 140 21 22.5 25 6.55 3.325Roach 160 21.1 22.5 25 6.4 3.8Roach 169 22 24 27.2 7.5344 3.8352Roach 161 22 23.4 26.7 6.9153 3.6312Roach 200 22.1 23.5 26.8 7.3968 4.1272Roach 180 23.6 25.2 27.9 7.0866 3.906Roach 290 24 26 29.2 8.8768 4.4968Roach 272 25 27 30.6 8.568 4.7736Roach 390 29.5 31.7 35 9.485 5.355Whitefish 270 23.6 26 28.7 8.3804 4.2476Whitefish 270 24.1 26.5 29.3 8.1454 4.2485Whitefish 306 25.6 28 30.8 8.778 4.6816Whitefish 540 28.5 31 34 10.744 6.562Whitefish 800 33.7 36.4 39.6 11.7612 6.5736Whitefish 1000 37.3 40 43.5 12.354 6.525Parkki 55 13.5 14.7 16.5 6.8475 2.3265Parkki 60 14.3 15.5 17.4 6.5772 2.3142Parkki 90 16.3 17.7 19.8 7.4052 2.673Parkki 120 17.5 19 21.3 8.3922 2.9181Parkki 150 18.4 20 22.4 8.8928 3.2928Parkki 140 19 20.7 23.2 8.5376 3.2944Parkki 170 19 20.7 23.2 9.396 3.4104Parkki 145 19.8 21.5 24.1 9.7364 3.1571Parkki 200 21.2 23 25.8 10.3458 3.6636Parkki 273 23 25 28 11.088 4.144Parkki 300 24 26 29 11.368 4.234Perch 5.9 7.5 8.4 8.8 2.112 1.408Perch 32 12.5 13.7 14.7 3.528 1.9992Perch 40 13.8 15 16 3.824 2.432Perch 51.5 15 16.2 17.2 4.5924 2.6316Perch 70 15.7 17.4 18.5 4.588 2.9415Perch 100 16.2 18 19.2 5.2224 3.3216Perch 78 16.8 18.7 19.4 5.1992 3.1234Perch 80 17.2 19 20.2 5.6358 3.0502Perch 85 17.8 19.6 20.8 5.1376 3.0368Perch 85 18.2 20 21 5.082 2.772Perch 110 19 21 22.5 5.6925 3.555Perch 115 19 21 22.5 5.9175 3.3075Perch 125 19 21 22.5 5.6925 3.6675Perch 130 19.3 21.3 22.8 6.384 3.534Perch 120 20 22 23.5 6.11 3.4075Perch 120 20 22 23.5 5.64 3.525Perch 130 20 22 23.5 6.11 3.525Perch 135 20 22 23.5 5.875 3.525Perch 110 20 22 23.5 5.5225 3.995Perch 130 20.5 22.5 24 5.856 3.624Perch 150 20.5 22.5 24 6.792 3.624Perch 145 20.7 22.7 24.2 5.9532 3.63Perch 150 21 23 24.5 5.2185 3.626Perch 170 21.5 23.5 25 6.275 3.725Perch 225 22 24 25.5 7.293 3.723Perch 145 22 24 25.5 6.375 3.825Perch 188 22.6 24.6 26.2 6.7334 4.1658Perch 180 23 25 26.5 6.4395 3.6835Perch 197 23.5 25.6 27 6.561 4.239Perch 218 25 26.5 28 7.168 4.144Perch 300 25.2 27.3 28.7 8.323 5.1373Perch 260 25.4 27.5 28.9 7.1672 4.335Perch 265 25.4 27.5 28.9 7.0516 4.335Perch 250 25.4 27.5 28.9 7.2828 4.5662Perch 250 25.9 28 29.4 7.8204 4.2042Perch 300 26.9 28.7 30.1 7.5852 4.6354Perch 320 27.8 30 31.6 7.6156 4.7716Perch 514 30.5 32.8 34 10.03 6.018Perch 556 32 34.5 36.5 10.2565 6.3875Perch 840 32.5 35 37.3 11.4884 7.7957Perch 685 34 36.5 39 10.881 6.864Perch 700 34 36 38.3 10.6091 6.7408Perch 700 34.5 37 39.4 10.835 6.2646Perch 690 34.6 37 39.3 10.5717 6.3666Perch 900 36.5 39 41.4 11.1366 7.4934Perch 650 36.5 39 41.4 11.1366 6.003Perch 820 36.6 39 41.3 12.4313 7.3514Perch 850 36.9 40 42.3 11.9286 7.1064Perch 900 37 40 42.5 11.73 7.225Perch 1015 37 40 42.4 12.3808 7.4624Perch 820 37.1 40 42.5 11.135 6.63Perch 1100 39 42 44.6 12.8002 6.8684Perch 1000 39.8 43 45.2 11.9328 7.2772Perch 1100 40.1 43 45.5 12.5125 7.4165Perch 1000 40.2 43.5 46 12.604 8.142Perch 1000 41.1 44 46.6 12.4888 7.5958Pike 200 30 32.3 34.8 5.568 3.3756Pike 300 31.7 34 37.8 5.7078 4.158Pike 300 32.7 35 38.8 5.9364 4.3844Pike 300 34.8 37.3 39.8 6.2884 4.0198Pike 430 35.5 38 40.5 7.29 4.5765Pike 345 36 38.5 41 6.396 3.977Pike 456 40 42.5 45.5 7.28 4.3225Pike 510 40 42.5 45.5 6.825 4.459Pike 540 40.1 43 45.8 7.786 5.1296Pike 500 42 45 48 6.96 4.896Pike 567 43.2 46 48.7 7.792 4.87Pike 770 44.8 48 51.2 7.68 5.376Pike 950 48.3 51.7 55.1 8.9262 6.1712Pike 1250 52 56 59.7 10.6863 6.9849Pike 1600 56 60 64 9.6 6.144Pike 1550 56 60 64 9.6 6.144Pike 1650 59 63.4 68 10.812 7.48Smelt 6.7 9.3 9.8 10.8 1.7388 1.0476Smelt 7.5 10 10.5 11.6 1.972 1.16Smelt 7 10.1 10.6 11.6 1.7284 1.1484Smelt 9.7 10.4 11 12 2.196 1.38Smelt 9.8 10.7 11.2 12.4 2.0832 1.2772Smelt 8.7 10.8 11.3 12.6 1.9782 1.2852Smelt 10 11.3 11.8 13.1 2.2139 1.2838Smelt 9.9 11.3 11.8 13.1 2.2139 1.1659Smelt 9.8 11.4 12 13.2 2.2044 1.1484Smelt 12.2 11.5 12.2 13.4 2.0904 1.3936Smelt 13.4 11.7 12.4 13.5 2.43 1.269Smelt 12.2 12.1 13 13.8 2.277 1.2558Smelt 19.7 13.2 14.3 15.2 2.8728 2.0672Smelt 19.9 13.8 15 16.2 2.9322 1.8792
Line 2: pandas library is imported to read DataFrame.
Line 6: Read the data from the Fish.txt
file with columns defined in line 5.
Line 9: Prints the top five rows of the DataFrame. The three lengths define the vertical, diagonal, and cross lengths in cm.
Here, the fish’s length, height, and width are independent variables, with weight serving as the dependent variable. In machine learning, independent variables are often referred to as features and dependent variables as labels, and these terms will be used interchangeably throughout this blog.
Linear regression models, a fundamental concept you’ll encounter as you learn machine learning, are widely used in statistics and machine learning. These models use a straight line to describe the relationship between an independent variable and a dependent variable. For example, when analyzing the weight of fish, a linear regression model is used to describe the relationship between the weight of the fish and one of the independent variables as follows,
Where is the slope of the line that defines its steepness, and is the y-intercept, the point where line crosses the y-axis.
The dataset contains five independent variables. A simple linear regression model with only one feature can be initiated by selecting the most strongly related feature to the fish’s Weight
. One approach to accomplish this is to calculate the cross-correlation between Weight
and the features.
# Finding the cross-correlation matrixprint(Fish.corr())
Ater examining the first column, the following is observed:
Weight
, and the feature X-Length
.Weight
has the weakest correlation with Height
.Given this information, it is clear that if the individual is limited to using only one independent variable to predict the dependent variable, they should choose X-Length
and not Height
.
# Step 3: Separating the data into features and labels
X = Fish[['X-Length']]
y = Fish['Weight']
With the features and labels in place, DataFrame can now be divided into training and test sets. The training dataset trains the model, while the test dataset evaluates its performance.
The train_test_split
function is imported from the sklearn
library to split the data.
from sklearn.model_selection import train_test_split# Step 4: Dividing the dataset into test and train dataX_train, X_test, y_train, y_test =train_test_split(X, y,test_size=0.3,random_state=10,shuffle=True)
The arguments of the train_test_split
function can be examined as follows:
test_size=0.3
to select 70% of the data for training and the remaining 30% for testing purposes.shuffle=True
to ensure that the model is not overfitting to a specific set of data.As a result, the training data in variables X_train
and y_train
and test data in X_test
and y_test
is obtained.
At this point, the linear regression model can be created.
from sklearn.linear_model import LinearRegression# Step 5: Selecting the linear regression method from scikit-learn librarymodel = LinearRegression().fit(X_train, y_train)
LinearRegression
function from sklearn
library is imported.X_train
and y_train
.Remember, 30% of the data was set aside for testing. The Mean Absolute Error (MAE) can be calculated using this data as an indicator of the average absolute difference between the predicted and actual values, with a lower MAE value indicating more accurate predictions. Other measures for model validation exist, but they won’t be explored in this context.
Here’s a complete running example, including all of the previously mentioned steps mentioned above to perform a linear regression.
# Step 1: Importing librariesimport pandas as pdfrom sklearn.model_selection import train_test_splitfrom sklearn.linear_model import LinearRegressionfrom sklearn import metrics# Step 2: Defining the columns of and reading the DataFramecolumns = ['Species', 'Weight', 'V-Length', 'D-Length', 'X-Length', 'Height', 'Width']Fish = pd.read_csv('Fish.txt', sep='\t', usecols=columns)# Step 3: Seperating the data into features and labelsX = Fish[['X-Length']]y = Fish['Weight']# Step 4: Dividing the dataset into test and train dataX_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=10, shuffle=True)# Step 5: Selecting the linear regression method from the scikit-learn librarymodel = LinearRegression().fit(X_train, y_train)# Step 6: Validation# Evaluating the trained model on training datay_prediction = model.predict(X_train)print("MAE on train data= " , metrics.mean_absolute_error(y_train, y_prediction))# Evaluating the trained model on test datay_prediction = model.predict(X_test)print("MAE on test data = " , metrics.mean_absolute_error(y_test, y_prediction))
In this instance, the model.predict()
function is applied to the training data on line 23, and on line 26, it is used on the test data. But what does it show?
Essentially, this approach demonstrates the model’s performance on a known dataset when compared to an unfamiliar test dataset. The two MAE values suggest that the predictions on both train and test data are similar.
Note: It is essential to recall that the
X-Length
was chosen as the feature because of its high correlation with the label. To verify the choice of feature, one can replace it with theHeight
on line 12 and rerun the linear regression, then compare the two MAE values.
So far, only one feature, X-Length
has been used to train the model. However, there are features available that can be utilized to improve the predictions. These features include the vertical length, diagonal length, height, and width of the fish, and can be used to re-evaluate the linear regression model.
# Step 3: Separating the data into features and labels
X = Fish[['V-Length', 'D-Length', 'X-Length', 'Height', 'Width']]
y = Fish['Weight']
Mathematically, the multiple linear regression model can be written as follows:
where represents the weightage for feature in predicting and denotes the number of features.
Following the similar steps as earlier, the performance of the model can be calculated by utilizing all the features.
Species Weight V-Length D-Length X-Length Height WidthBream 290 24 26.3 31.2 12.48 4.3056Bream 340 23.9 26.5 31.1 12.3778 4.6961Bream 363 26.3 29 33.5 12.73 4.4555Bream 430 26.5 29 34 12.444 5.134Bream 450 26.8 29.7 34.7 13.6024 4.9274Bream 500 26.8 29.7 34.5 14.1795 5.2785Bream 390 27.6 30 35 12.67 4.69Bream 450 27.6 30 35.1 14.0049 4.8438Bream 500 28.5 30.7 36.2 14.2266 4.9594Bream 475 28.4 31 36.2 14.2628 5.1042Bream 500 28.7 31 36.2 14.3714 4.8146Bream 500 29.1 31.5 36.4 13.7592 4.368Bream 340 29.5 32 37.3 13.9129 5.0728Bream 600 29.4 32 37.2 14.9544 5.1708Bream 600 29.4 32 37.2 15.438 5.58Bream 700 30.4 33 38.3 14.8604 5.2854Bream 700 30.4 33 38.5 14.938 5.1975Bream 610 30.9 33.5 38.6 15.633 5.1338Bream 650 31 33.5 38.7 14.4738 5.7276Bream 575 31.3 34 39.5 15.1285 5.5695Bream 685 31.4 34 39.2 15.9936 5.3704Bream 620 31.5 34.5 39.7 15.5227 5.2801Bream 680 31.8 35 40.6 15.4686 6.1306Bream 700 31.9 35 40.5 16.2405 5.589Bream 725 31.8 35 40.9 16.36 6.0532Bream 720 32 35 40.6 16.3618 6.09Bream 714 32.7 36 41.5 16.517 5.8515Bream 850 32.8 36 41.6 16.8896 6.1984Bream 1000 33.5 37 42.6 18.957 6.603Bream 920 35 38.5 44.1 18.0369 6.3063Bream 955 35 38.5 44 18.084 6.292Bream 925 36.2 39.5 45.3 18.7542 6.7497Bream 975 37.4 41 45.9 18.6354 6.7473Bream 950 38 41 46.5 17.6235 6.3705Roach 40 12.9 14.1 16.2 4.1472 2.268Roach 69 16.5 18.2 20.3 5.2983 2.8217Roach 78 17.5 18.8 21.2 5.5756 2.9044Roach 87 18.2 19.8 22.2 5.6166 3.1746Roach 120 18.6 20 22.2 6.216 3.5742Roach 0 19 20.5 22.8 6.4752 3.3516Roach 110 19.1 20.8 23.1 6.1677 3.3957Roach 120 19.4 21 23.7 6.1146 3.2943Roach 150 20.4 22 24.7 5.8045 3.7544Roach 145 20.5 22 24.3 6.6339 3.5478Roach 160 20.5 22.5 25.3 7.0334 3.8203Roach 140 21 22.5 25 6.55 3.325Roach 160 21.1 22.5 25 6.4 3.8Roach 169 22 24 27.2 7.5344 3.8352Roach 161 22 23.4 26.7 6.9153 3.6312Roach 200 22.1 23.5 26.8 7.3968 4.1272Roach 180 23.6 25.2 27.9 7.0866 3.906Roach 290 24 26 29.2 8.8768 4.4968Roach 272 25 27 30.6 8.568 4.7736Roach 390 29.5 31.7 35 9.485 5.355Whitefish 270 23.6 26 28.7 8.3804 4.2476Whitefish 270 24.1 26.5 29.3 8.1454 4.2485Whitefish 306 25.6 28 30.8 8.778 4.6816Whitefish 540 28.5 31 34 10.744 6.562Whitefish 800 33.7 36.4 39.6 11.7612 6.5736Whitefish 1000 37.3 40 43.5 12.354 6.525Parkki 55 13.5 14.7 16.5 6.8475 2.3265Parkki 60 14.3 15.5 17.4 6.5772 2.3142Parkki 90 16.3 17.7 19.8 7.4052 2.673Parkki 120 17.5 19 21.3 8.3922 2.9181Parkki 150 18.4 20 22.4 8.8928 3.2928Parkki 140 19 20.7 23.2 8.5376 3.2944Parkki 170 19 20.7 23.2 9.396 3.4104Parkki 145 19.8 21.5 24.1 9.7364 3.1571Parkki 200 21.2 23 25.8 10.3458 3.6636Parkki 273 23 25 28 11.088 4.144Parkki 300 24 26 29 11.368 4.234Perch 5.9 7.5 8.4 8.8 2.112 1.408Perch 32 12.5 13.7 14.7 3.528 1.9992Perch 40 13.8 15 16 3.824 2.432Perch 51.5 15 16.2 17.2 4.5924 2.6316Perch 70 15.7 17.4 18.5 4.588 2.9415Perch 100 16.2 18 19.2 5.2224 3.3216Perch 78 16.8 18.7 19.4 5.1992 3.1234Perch 80 17.2 19 20.2 5.6358 3.0502Perch 85 17.8 19.6 20.8 5.1376 3.0368Perch 85 18.2 20 21 5.082 2.772Perch 110 19 21 22.5 5.6925 3.555Perch 115 19 21 22.5 5.9175 3.3075Perch 125 19 21 22.5 5.6925 3.6675Perch 130 19.3 21.3 22.8 6.384 3.534Perch 120 20 22 23.5 6.11 3.4075Perch 120 20 22 23.5 5.64 3.525Perch 130 20 22 23.5 6.11 3.525Perch 135 20 22 23.5 5.875 3.525Perch 110 20 22 23.5 5.5225 3.995Perch 130 20.5 22.5 24 5.856 3.624Perch 150 20.5 22.5 24 6.792 3.624Perch 145 20.7 22.7 24.2 5.9532 3.63Perch 150 21 23 24.5 5.2185 3.626Perch 170 21.5 23.5 25 6.275 3.725Perch 225 22 24 25.5 7.293 3.723Perch 145 22 24 25.5 6.375 3.825Perch 188 22.6 24.6 26.2 6.7334 4.1658Perch 180 23 25 26.5 6.4395 3.6835Perch 197 23.5 25.6 27 6.561 4.239Perch 218 25 26.5 28 7.168 4.144Perch 300 25.2 27.3 28.7 8.323 5.1373Perch 260 25.4 27.5 28.9 7.1672 4.335Perch 265 25.4 27.5 28.9 7.0516 4.335Perch 250 25.4 27.5 28.9 7.2828 4.5662Perch 250 25.9 28 29.4 7.8204 4.2042Perch 300 26.9 28.7 30.1 7.5852 4.6354Perch 320 27.8 30 31.6 7.6156 4.7716Perch 514 30.5 32.8 34 10.03 6.018Perch 556 32 34.5 36.5 10.2565 6.3875Perch 840 32.5 35 37.3 11.4884 7.7957Perch 685 34 36.5 39 10.881 6.864Perch 700 34 36 38.3 10.6091 6.7408Perch 700 34.5 37 39.4 10.835 6.2646Perch 690 34.6 37 39.3 10.5717 6.3666Perch 900 36.5 39 41.4 11.1366 7.4934Perch 650 36.5 39 41.4 11.1366 6.003Perch 820 36.6 39 41.3 12.4313 7.3514Perch 850 36.9 40 42.3 11.9286 7.1064Perch 900 37 40 42.5 11.73 7.225Perch 1015 37 40 42.4 12.3808 7.4624Perch 820 37.1 40 42.5 11.135 6.63Perch 1100 39 42 44.6 12.8002 6.8684Perch 1000 39.8 43 45.2 11.9328 7.2772Perch 1100 40.1 43 45.5 12.5125 7.4165Perch 1000 40.2 43.5 46 12.604 8.142Perch 1000 41.1 44 46.6 12.4888 7.5958Pike 200 30 32.3 34.8 5.568 3.3756Pike 300 31.7 34 37.8 5.7078 4.158Pike 300 32.7 35 38.8 5.9364 4.3844Pike 300 34.8 37.3 39.8 6.2884 4.0198Pike 430 35.5 38 40.5 7.29 4.5765Pike 345 36 38.5 41 6.396 3.977Pike 456 40 42.5 45.5 7.28 4.3225Pike 510 40 42.5 45.5 6.825 4.459Pike 540 40.1 43 45.8 7.786 5.1296Pike 500 42 45 48 6.96 4.896Pike 567 43.2 46 48.7 7.792 4.87Pike 770 44.8 48 51.2 7.68 5.376Pike 950 48.3 51.7 55.1 8.9262 6.1712Pike 1250 52 56 59.7 10.6863 6.9849Pike 1600 56 60 64 9.6 6.144Pike 1550 56 60 64 9.6 6.144Pike 1650 59 63.4 68 10.812 7.48Smelt 6.7 9.3 9.8 10.8 1.7388 1.0476Smelt 7.5 10 10.5 11.6 1.972 1.16Smelt 7 10.1 10.6 11.6 1.7284 1.1484Smelt 9.7 10.4 11 12 2.196 1.38Smelt 9.8 10.7 11.2 12.4 2.0832 1.2772Smelt 8.7 10.8 11.3 12.6 1.9782 1.2852Smelt 10 11.3 11.8 13.1 2.2139 1.2838Smelt 9.9 11.3 11.8 13.1 2.2139 1.1659Smelt 9.8 11.4 12 13.2 2.2044 1.1484Smelt 12.2 11.5 12.2 13.4 2.0904 1.3936Smelt 13.4 11.7 12.4 13.5 2.43 1.269Smelt 12.2 12.1 13 13.8 2.277 1.2558Smelt 19.7 13.2 14.3 15.2 2.8728 2.0672Smelt 19.9 13.8 15 16.2 2.9322 1.8792
The MAE values will be similar to the results obtained when using a single feature.
This blog explains the concept of polynomial regression, which is used when the assumption of a linear relationship between the features and label is not accurate. By allowing for a more flexible fit to the data, polynomial regression can capture more complex relationships and lead to more accurate predictions.
For example, if the relationship between the dependent variables and the independent variable is not a straight line, a polynomial regression model can be used to model it more accurately. This can lead to a better fit to the data and more accurate predictions.
Mathematically, the relationship between dependent and independent variables is described using the following equation:
The above equation looks very similar to the one used earlier to describe multiple linear regression. However, it includes the transformed features called 's which are the polynomial version of 's used in multiple linear regression.
This can be further explained using an example of two features and to create new features , , , , , , , and so on.
The new polynomial features can be created based on trial and error or techniques like cross-validation. The degree of the polynomial can also be chosen based on the complexity of the relationship between the variables.
The following example presents a polynomial regression and validates the models’ performance.
# Step 1: Importing librariesimport pandas as pdfrom sklearn.model_selection import train_test_splitfrom sklearn.linear_model import LinearRegressionfrom sklearn import metricsfrom sklearn.preprocessing import PolynomialFeatures# Step 2: Defining the columns and reading the DataFramecolumns = ['Species', 'Weight', 'V-Length', 'D-Length', 'X-Length', 'Height', 'Width']Fish = pd.read_csv('Fish.txt', sep='\t', usecols=columns)# Step 3: Seperating the data into features and labelsX = Fish[['V-Length', 'D-Length', 'X-Length', 'Height', 'Width']]y = Fish['Weight']# Step 4: Generating polynomial featuresZ = PolynomialFeatures(degree=2, include_bias=False).fit_transform(X)# Dividing the dataset into test and train dataX_train, X_test, y_train, y_test = train_test_split(Z, y, test_size=0.3, random_state=10)# Step 5: Selecting the linear regression method from the scikit-learn librarymodel = LinearRegression().fit(X_train, y_train)# Step 6: Validation# Evaluating the trained model on training datay_prediction = model.predict(X_train)print("MAE on train data= " , metrics.mean_absolute_error(y_train, y_prediction))# Evaluating our trained model on test datay_prediction = model.predict(X_test)print("MAE on test data = " , metrics.mean_absolute_error(y_test, y_prediction))
The features were transformed using PolynomialFeatures
function on line 18. The PolynomialFeatures
function, imported from the sklearn
library on line 7, was used for this purpose.
It should be noticed that the MAE value in this case is superior to that of linear regression models, implying that the linear assumption was not entirely accurate.
This blog has provided a quick introduction to Machine learning regression models with python. Don’t stop here! Explore and practice different techniques and libraries to build more accurate and robust models. You can also check out the following courses on Educative:
A Practical Guide to Machine Learning with Python
This course teaches you how to code basic machine learning models. The content is designed for beginners with general knowledge of machine learning, including common algorithms such as linear regression, logistic regression, SVM, KNN, decision trees, and more. If you need a refresher, we have summarized key concepts from machine learning, and there are overviews of specific algorithms dispersed throughout the course.
Mastering Machine Learning Theory and Practice
The machine learning field is rapidly advancing today due to the availability of large datasets and the ability to process big data efficiently. Moreover, several new techniques have produced groundbreaking results for standard machine learning problems. This course provides a detailed description of different machine learning algorithms and techniques, including regression, deep learning, reinforcement learning, Bayes nets, support vector machines (SVMs), and decision trees. The course also offers sufficient mathematical details for a deeper understanding of how different techniques work. An overview of the Python programming language and the fundamental theoretical aspects of ML, including probability theory and optimization, is also included. The course contains several practical coding exercises as well. By the end of the course, you will have a deep understanding of different machine-learning methods and the ability to choose the right method for different applications.
Hands-on Machine Learning with Scikit-Learn
Scikit-Learn is a powerful library that provides a handful of supervised and unsupervised learning algorithms. If you’re serious about having a career in machine learning, then scikit-learn is a must know. In this course, you will start by learning the various built-in datasets that scikit-learn offers, such as iris and mnist. You will then learn about feature engineering and more specifically, feature selection, feature extraction, and dimension reduction. In the latter half of the course, you will dive into linear and logistic regression where you’ll work through a few challenges to test your understanding. Lastly, you will focus on unsupervised learning and deep learning where you’ll get into k-means clustering and neural networks. By the end of this course, you will have a great new skill to add to your resume, and you’ll be ready to start working on your own projects that will utilize scikit-learn.
Free Resources