Generating Image, Video, and Audio Embeddings
Learn how to use CNNs to generate image, video, and audio embeddings.
Generating image embeddings
The dataset contains images of different pen types, sofas, cups, and glass.
Embedding model: Pretrained CNN (ResNet-18)
For image embeddings, the code utilizes a pretrained ResNet-18 model. ResNet (Residual Network) is a deep convolutional neural network architecture known for its effectiveness in image classification tasks. ResNet-18 consists of 18 layers and has shown strong performance on various image recognition benchmarks. We obtain a feature representation or embedding of the input image by removing the final fully connected layer. This embedding captures high-level features of the image, allowing us to perform tasks like similarity comparison and image retrieval.
We begin by importing necessary libraries. os
is imported for interacting with the file system, torch
for deep learning functionalities, torchvision.transforms
for image transformations, torchvision.models
for pretrained models, PIL
for image processing, and cosine_similarity
from sklearn.metrics.pairwise
for computing cosine similarity between vectors.
import osimport torchimport torchvision.transforms as transformsimport torchvision.models as modelsfrom PIL import Imagefrom sklearn.metrics.pairwise import cosine_similarity
We load a pretrained ResNet-18 model using models.resnet18(pretrained=True)
. This model is a convolutional neural network architecture known for its effectiveness in image classification tasks. The final fully connected layer of the model is removed, and the model is set to evaluation mode.
Note: The model we are using to generate image embeddings is pretrained on the image classification task, so we need to remove the final fully connected classification layer and extract the image features from the last hidden layer.
# Load pretrained ResNet modelresnet_model = models.resnet18(pretrained=True)# Remove the final fully connected layerresnet_model = torch.nn.Sequential(*(list(resnet_model.children())[:-1]))# Set the model to evaluation moderesnet_model.eval()
We define a preprocess_image
function, which takes the path to an image file as input and performs a series of transformations on the image using transforms.Compose
. These transformations include resizing the image to 256 x 256 ...