Kaggle Challenge - Data Transformation
We'll cover the following...
3. Transformation Pipelines
As you can see, from imputing missing values to feature scaling to handling categorical attributes, we have many data transformation steps that need to be executed in the right order. Fortunately, Scikit-Learn is here to make our life easier: Scikit-Learn provides the Pipeline
class to help with such sequences of transformations.
π Note: Creating transformation pipelines is optional. It is handy when dealing with a large number of attributes, so it is a good-to-know feature of Scikit-Learn. In fact, at this point we could directly move on to create our machine learning model. However, for learning how things are done, we are going to look at working with pipelines.
Some Scikit-Learn terminology:
-
Estimators: An object that can estimate some parameters based on a dataset, e.g., an imputer is an estimator). The estimation itself is performed by simply calling the
fit()
method. -
...