Loading Image Dataset
Learn how to load and process image and tabular data.
We'll cover the following...
Loading image dataset
Let’s now see how we can load image data. We’ll use the Cats and Dogs images. We start by extracting the dataset from the zip file.
import zipfilewith zipfile.ZipFile('../train.zip', 'r') as zip_ref:zip_ref.extractall('.')
In the code above:
Line 1: We import the
zipfilelibrary.Lines 3–4: We call the
ZipFile()method of thezipfilemodule to open the zip file in read mode aszip_ref. We use thewithstatement to automatically close the file after the code execution. We call theextractall()method to extract the content of the zip file in the current directory.
Next, we create a pandas DataFrame containing the labels and paths to the images.
import pandas as pdbase_dir = 'train'filenames = os.listdir(base_dir)categories = []for filename in filenames:category = filename.split('.')[0]if category == 'dog':categories.append("dog")else:categories.append("cat")df = pd.DataFrame({'filename': filenames,'category': categories})print(df)
In the code above:
Line 1: We import the
pandaslibrary aspd.Lines 2–3: We define the base directory
base-dirthat contains the images for training the model. We call thelistdir()method of theosmodule to get all file names present in thebase_dir.Line 4: We define a list
categoriesto store the category of each file. ...