...

/

Scaling Data, Pipelines, and Interaction Features in Scikit-Learn

Scaling Data, Pipelines, and Interaction Features in Scikit-Learn

Learn about the function for scaling data, the creation of pipelines, and interaction features in scikit-learn.

We'll cover the following...

Scaling data

Compared to the synthetic data we were just working with, the case study data is relatively large. If we want to use L1 regularization, then according to the scikit-learn documentation, we ought to use the saga solver. However, this solver is not robust to unscaled datasets. Therefore, we need to be sure to scale the data. This is also a good idea whenever doing regularization, so all the features are on the same scale and are equally penalized by the regularization process.

Press + to interact
Scaling data
Scaling data

A simple way to make sure that all the features have the same scale is to put them all through the transformation of subtracting the minimum and dividing by the range from minimum to maximum. This transforms each feature so that it will have a minimum of 0 and a maximum of 1. To instantiate the MinMaxScaler scaler that does this, we can use the following code:

from sklearn.preprocessing import MinMaxScaler 
min_max_sc = MinMaxScaler()

Pipelines

Previously, we used a logistic regression model ...