...

/

Loading Image Dataset

Loading Image Dataset

Learn how to load and process image and tabular data.

Loading image dataset

Let’s now see how we can load image data. We’ll use the Cats and Dogs images. We start by extracting the dataset from the zip file.

Press + to interact
import zipfile
with zipfile.ZipFile('../train.zip', 'r') as zip_ref:
zip_ref.extractall('.')

In the code above:

  • Line 1: We import the zipfile library.

  • Lines 3–4: We call the ZipFile() method of the zipfile module to open the zip file in read mode as zip_ref. We use the with statement to automatically close the file after the code execution. We call the extractall() method to extract the content of the zip file in the current directory.

Next, we create a pandas DataFrame containing the labels and paths to the images.

Press + to interact
import pandas as pd
base_dir = 'train'
filenames = os.listdir(base_dir)
categories = []
for filename in filenames:
category = filename.split('.')[0]
if category == 'dog':
categories.append("dog")
else:
categories.append("cat")
df = pd.DataFrame({'filename': filenames,'category': categories})
print(df)

In the code above:

  • Line 1: We import the pandas library as pd.

  • Lines 2–3: We define the base directory base-dir that contains the images for training the model. We call the listdir() method of the os module to get all file names present in the base_dir.

  • Line 4: We define a list categories to store the category of each file. ...