What is feature scaling?

In machine learning, feature scaling is a preprocessing step that ensures that the values of different input features are transformed so that they are on a similar scale. The purpose is to bring the features to a common range so that no feature dominates the other in the learning process.

Real-world example

Let's explore a real-world example of why feature scaling is an important concept in the machine-learning world. Imagine that we are working on a machine learning project to predict house prices. The features involved are dimensions in square feet, number of bedrooms, number of bathrooms, and neighborhood. For simplicity of the problem, let's focus on just two features: square footage and number of bedrooms. The data table is shown below:

By looking at the values, we can see that Dimensions will have a major contribution towards the model learning process because it has relatively higher values to the Bedrooms. This may cause problems during the training and learning process. Thus, we need to bring features to a common range so that no feature dominates the other depending on its number value.

Feature scaling techniques

Feature scaling can be performed using a number of techniques, some of which are listed below:

Min-max normalization
Standardization
Log scaling
Absolute maximum scaling

Min-max normalization

This feature scaling technique transforms a feature into a range from 0 to 1. The formula for this technique is:

from sklearn.preprocessing import StandardScaler
from sklearn.preprocessing import MinMaxScaler
from sklearn.preprocessing import FunctionTransformer
from sklearn.preprocessing import MaxAbsScaler
import numpy as np

# Sample data
data = np.array([[1500, 3],
                 [2500, 4],
                 [1800, 2],
                 [2200, 3]])

# Initialize the scaling techniques
minmax_scaler = MinMaxScaler()
standard_scaler = StandardScaler()
log_transform = FunctionTransformer(np.log1p, validate=True)
maxabs_scaler = MaxAbsScaler()

# Fit and transform the data
scaled_data1 = minmax_scaler.fit_transform(data)
scaled_data2 = standard_scaler.fit_transform(data)
scaled_data3 = log_transform.transform(data)
scaled_data4 = maxabs_scaler.fit_transform(data)

# Apply the transformations

print("Original data:\n", data)
print("\nScaled data (Min-Max Scaling):\n", scaled_data1)
print("\nScaled data (Z-Score Scaling):\n", scaled_data2)
print("\nScaled data (Log Scaling):\n", scaled_data3)
print("\nScaled data (Max Abs Scaling):\n", scaled_data4)

Coding example to implement feature scaling using sklearn

Code explanation

Lines 1–4: We import the StandardScaler, MinMaxScaler, FunctionTransformer, and MaxAbsScaler from sklearn's preprocessing module.
Line 8: We create a 2-dimensional NumPwhat y array and fill it with dummy data on which we will apply scaling.
Lines 14–17: We create the instances of each scaling function and store them in a separate variable.
Lines 20–23: Using the fit_transform() method provided by each instance, we transform the data.
Lines 27–31: We display the result of the transformation applied by each scaling technique.

Wrapping up

Feature scaling is a data pre-processing step that acts as a bridge between the raw data and a successful machine learning model. By utilizing the appropriate scaling technique and integrating it into our preprocessing pipeline, we can empower our models to better understand and learn from our data, ultimately leading to accurate and valuable predictions.

Learn in-demand tech skills in half the time

PRODUCTS

Mock Interview

New

Courses

Skill Paths

Projects

Assessments

House	Dimensions (Square feet)	Bedrooms
A	15000	4
B	2000	3
C	3421	4

What is feature scaling?

Real-world example

Feature scaling techniques

Min-max normalization

Z-score scaling (standardization)

Log scaling

Absolute maximum scaling

Coding example

Code explanation

Wrapping up