The Standard ML Pipeline
Learn about the ML pipeline and operations like data preparation, validation, model training, and more.
We'll cover the following...
We all know a bit about ML and data science by now, but how exactly do industry professionals turn a dataset into a production-ready application?
We call this the ML pipeline, and, while there’s no set standard of steps, we usually break the procedure down into six steps (detailed in the graphic below).
We’ll dive into each of these steps in this lesson and cover the operations that are typically performed during it. In the next lesson, we’ll discuss how these operations can sometimes become sources of disasters that create irreversible damage to the pipeline and therefore to the team and the company.
Data preparation
Once a dataset is acquired, steps are taken to convert the raw data into something that a model can understand. This typically involves feature engineering (i.e., deciding how to break apart or combine columns into more meaningful variables), data cleaning, dimensionality reduction (e.g., principal component analysis or PCA), and much more.
This is one ...