Loading and Generating a Dataset
Learn to load an existing dataset and process the data to generate a dataset into a TF consumable format.
We'll cover the following...
Data pipeline
A series of data processing elements form a data pipeline. ML and DL algorithms commonly use data pipelines because they need a huge amount of data to build a reasonable model. A pipeline can generate a dataset, load the dataset into memory, and perform data cleaning and transformation. Furthermore, it divides a large dataset into batches manageable by the TF framework.
A data pipeline can consume data from various data sources, such as:
NumPy arrays
Comma-separated values (CSV) files
Text data
Images
Data ingestion, a process to import data files from multiple sources to a single storage, might be needed before analyzing and preparing the data for the TF framework. Once we have our data, we employ a three-phase process to prepare the input data in a format consumable by TF models:
Extract data from various sources (main memory, local disk, and cloud).
Transform (clean, shuffle, etc.) data.
Load data in an output container.
This is the basic process of ...