...

Machine Learning

Train a machine learning model with scaled training data, predict with test data, and visualize predictions.

We'll cover the following...

Get started
Standardization: Feature scaling
Linear regression model training

Let's create a machine learning model using a linear regression module from scikit-learn to suggest the house price based on the selected features.

Get started

Let’s say we have cleaned our data, treated the missing values and categorical variables, removed outliers, and created required new features (if needed). Now, our data is ready to feed into the machine learning model. The very first thing to do now is to separate our data into the following:

X: Will contain the selected features, also called independent variables.
y: Will be the target values; in this case, the house price is also called the dependent variable.

Press + to interact

Python 3.8

from sklearn.preprocessing import StandardScaler
import pickle # need this import
scaler = StandardScaler() # Creating instance 'scaler'
scaler.fit(X) # fitting the features
pickle.dump(obj=scaler, file=open(file='transformation.pkl', mode='wb')) # Saving the transformation
scaler = pickle.load(file=open(file='transformation.pkl', mode='rb')) # Loading saved transformation
X_scaled = scaler.transform(X) # transforming features
# check the difference!
X_scaled=pd.DataFrame(X_scaled,columns=X.columns) #just creating a dataframe for scaled features
print("Scaled features (0 mean, 1 variance):")
print("CRIM mean:",round(X_scaled.CRIM.mean(),3), "NZ var:",round(np.var(X_scaled.CRIM),3))
print(X_scaled.head(2))

We have standardized all the features in the code above before splitting them into train and test datasets. It’s important to know that the model trained on standardized features needs unseen features to make predictions. So, it’s recommended and considered a good practice to serialize/save the transformation from the training dataset. We can then load it and transform the unseen data before making predictions.

Linear regression model training

Let's train our very first machine learning model.

Train test split

Now, we have features in X and target (price) in y. The next step is to split the data into:

A training set (X_train and y_train)
A testing set (X_test and y_test)

This splitting is important and can be conveniently done using the scikit-learn built-in method train_test_split. After splitting, we'll train our model on the training part of the dataset, which is in X_train and y_train. Then we'll use X_test from the test part of our dataset to ...

Course Introduction

Linear Regression

Regularization

Bias-Variance Trade-off

Categorical Features

Logistic Regression

Logistic Regression: Titanic Data

Sentiment Analysis Using Multinomial Logistic Regression

Multiclass Classification and Handling Imbalanced Classes

Project: Predicting Chronic Kidney Disease

K-Nearest Neighbors

Implementation of K-Nearest Neighbors

Logistic Regression vs. KNN

Decision Tree Learning

Implement the Decision Tree Classifier from Scratch

Bootstrapping and Confidence Interval

Support Vector Machine

Practice and Comparisons

What's Next?

Appendix

Machine Learning

Get started

Standardization: Feature scaling

Linear regression model training

Train test split