

Defining a tf.data.Dataset

Defining a tf.data.Dataset

Learn to create the TensorFlow data pipeline.

Helper functions

Now, let’s look at how we can create a tf.data.Dataset using the data. We’ll first write a few helper functions. Namely, we’ll define:

  • parse_image() to load and process an image from a filepath.

  • generate_tokenizer() to generate a tokenizer trained on the data passed to the function.

The parse_image() function

First, let’s discuss the parse_image() function. It takes three arguments:

  • filepath: Location of the image

  • resize_height: Height to resize the image to

  • resize_width: Width to resize the image to

The function is defined as follows:

def parse_image(filepath, resize_height, resize_width):
""" Reading an image from a given filepath """
# Reading the image
image = tf.io.read_file(filepath)
# Decode the JPEG and make sure there are three channels in the output
image = tf.io.decode_jpeg(image, channels=3)
image = tf.image.convert_image_dtype(image, tf.float32)
# Resize the image to 224x224
image = tf.image.resize(image, [resize_height, resize_width])
# Bring pixel values to [-1, 1]
image = image*2.0 - 1.0
return image
Read image from the path of the file

We are mostly relying on tf.image functions to load and process the image. This function specifically:

  • Reads the image from the  ...