Discretizing
Learn how to discretize variables.
We'll cover the following...
Discretizing features refers to the process of converting continuous numerical features into categorical features by dividing the range of the feature into intervals, called bins. It can be useful for transforming continuous features into a form that can be visualized and interpreted more easily.
In addition to potentially helping with interpretation, this technique can be used to reduce the memory and computational requirements of models, especially in resource-constrained environments, such as mobile devices or embedded systems.
The scikit-learn methods for discretizing features include KBinsDiscretizer
and QuantileTransformer
.
The KBinsDiscretizer
method
The KBinsDiscretizer
method discretizes continuous features into a specified number of bins. The following code demonstrates how to use the KBinsDiscretizer
method in scikit-learn:
import numpy as npfrom sklearn.preprocessing import KBinsDiscretizer# Define the numerical variablesX = np.array([[1.9, 2.8, 6],[4.7, 5.6, 8],[0.1, 2.8, 12],[0.4, 8.2, 99]])# Create the KBinsDiscretizer objectdiscretizer = KBinsDiscretizer(n_bins=3, encode='ordinal')# Transform the numerical variablesX_discretized = discretizer.fit_transform(X)# Print the original variables and the resulting discretized variablesprint("Original: \n",X)print("Discretized: \n",X_discretized.round(2))
...