Loading and Generating a Dataset

Learn to load an existing dataset and process the data to generate a dataset into a TF consumable format.

We'll cover the following...

Data pipeline
The tf.data API
- Load data from memory
- Load CSV files
Conclusion

Data pipeline

A series of data processing elements form a data pipeline. ML and DL algorithms commonly use data pipelines because they need a huge amount of data to build a reasonable model. A pipeline can generate a dataset, load the dataset into memory, and perform data cleaning and transformation. Furthermore, it divides a large dataset into batches manageable by the TF framework.

A data pipeline can consume data from various data sources, such as:

NumPy arrays
Comma-separated values (CSV) files
Text data
Images

Data ingestion, a process to import data files from multiple sources to a single storage, might be needed before analyzing and preparing the data for the TF framework. Once we have our data, we employ a three-phase process to prepare the input data in a format consumable by TF models:

Extract data from various sources (main memory, local disk, and cloud).
Transform (clean, shuffle, etc.) data.
Load data in an output container.

This is the basic process of extract, transform, and ...

Getting Started with Python

Machine Learning (ML) and Deep Learning (DL)

Customer Segmentation with K-Means Clustering

TensorFlow (TF)

Cats vs Dogs Classification with Convolutional Neural Networks

Dataset Processing Using TensorFlow

Keras: High-Level TF API

Diabetes Prediction Using Keras

Quick Start with Android Apps

TensorFlow (TF) Lite

Image Classification Apps Using TF Lite

Object Detection Apps Using TF Lite

Appendix

DL Model Using TF, Keras, and TF Lite

Loading and Generating a Dataset

Data pipeline