...

/

Machine Learning

Machine Learning

Train a machine learning model with scaled training data, predict with test data, and visualize predictions.

Let's create a machine learning model using a linear regression module from scikit-learn to suggest the house price based on the selected features.

Get started

Let’s say we have cleaned our data, treated the missing values and categorical variables, removed outliers, and created required new features (if needed). Now, our data is ready to feed into the machine learning model. The very first thing to do now is to separate our data into the following:

  • X: Will contain the selected features, also called independent variables.

  • y: Will be the target values; in this case, the house price is also called the dependent variable.

Press + to interact
X = df[['CRIM','RM','DIS','NOX']]
y = df['price'] # target
# Might be a good idea to recheck what is in X and y
print(X.head(2))
print('=======')
print(y.head(2))

Note: Uppercase X and lowercase y are just conventions, and it is recommended to use these variables for features and target, respectively.

Standardization: Feature scaling

Let's see what X (original unscaled features) looks like.

Press + to interact
# This is how X (original unscaled features) looks like!
print("Original unscaled features:")
print("CRIM mean:",round(X.CRIM.mean(),3), "CRIM var:",round(np.var(X.CRIM),3))
print(X.head(2))

Remember, the machine learning algorithms that employ gradient descent as an optimization strategy, such as linear regression, logistic regression, and neural networks, require data to be scaled. Let’s scale our features and check the difference.

Press + to interact
from sklearn.preprocessing import StandardScaler
import pickle # need this import
scaler = StandardScaler() # Creating instance 'scaler'
scaler.fit(X) # fitting the features
pickle.dump(obj=scaler, file=open(file='transformation.pkl', mode='wb')) # Saving the transformation
scaler = pickle.load(file=open(file='transformation.pkl', mode='rb')) # Loading saved transformation
X_scaled = scaler.transform(X) # transforming features
# check the difference!
X_scaled=pd.DataFrame(X_scaled,columns=X.columns) #just creating a dataframe for scaled features
print("Scaled features (0 mean, 1 variance):")
print("CRIM mean:",round(X_scaled.CRIM.mean(),3), "NZ var:",round(np.var(X_scaled.CRIM),3))
print(X_scaled.head(2))

We have standardized all the features in the code above before splitting them into train and test datasets. It’s important to know that the model trained on standardized features needs unseen features to make predictions. So, it’s recommended and considered a good practice to serialize/save the transformation from the training dataset. We can then load it and transform the unseen data before making predictions.

Linear regression model training

Let's train our very first machine learning model.

Train test split

Now, we have features in X and target (price) in y. The next step is to split the data into:

  • A training set (X_train and y_train)

  • A testing set (X_test and y_test) ...

Access this course and 1400+ top-rated courses and projects.