Hands-on Machine Learning with Scikit-Learn/

...

Data Preprocessing

In this lesson, we present some useful methods for data preprocessing.

We'll cover the following...

- Scale numerical feature
- - MinMax
- - Standard
- Non-linear feature mapping for numerical feature
  - Binarizer
- Working with categorical features
  - Label encoder

In the real world, data is not perfect. You need to spend a lot of time on data preprocessing, such as cleaning, scaling, normalizing, etc. Data preprocessing may be the most important step in the entire Machine Learning process. You may have heard the phrase "Garbage in, garbage out". If the data quality is not high, no matter how fancy the model is, an ideal result will not be achieved. Typically, for most engineers, 70 percent of the time is spent processing data.

The preprocessing package provides several common utility functions and transformer classes to change raw feature vectors into a representation that is more suitable for the downstream estimators.

Notice: There are many preprocessing types. In this lesson, we will cover some of the most commonly used methods. If you want to learn more, just launch the Jupyter file at the end of this lesson.

Preliminaries

Working with Datasets

Feature Engineering

General Concepts

Linear Regression

Logistic Regression

Support Vector Machine

Tree Model and Ensemble Method

Unsupervised Learning

Deep Learning

Others

What's Next

Data Preprocessing

Scale numerical feature

MinMax