...

/

The Standard ML Pipeline

The Standard ML Pipeline

Learn about the ML pipeline and operations like data preparation, validation, model training, and more.

We all know a bit about ML and data science by now, but how exactly do industry professionals turn a dataset into a production-ready application?

We call this the ML pipeline, and, while there’s no set standard of steps, we usually break the procedure down into six steps (detailed in the graphic below).

Press + to interact
The simplified ML pipeline
The simplified ML pipeline

We’ll dive into each of these steps in this lesson and cover the operations that are typically performed during it. In the next lesson, we’ll discuss how these operations can sometimes become sources of disasters that create irreversible damage to the pipeline and therefore to the team and the company.

Data preparation

Once a dataset is acquired, steps are taken to convert the raw data into something that a model can understand. This typically involves feature engineering (i.e., deciding how to break apart or combine columns into more meaningful variables), data cleaning, dimensionality reduction (e.g., principal component analysis or PCA), and much more.

This is one ...