What is zero-shot learning (ZSL)?

Introduction

Zero-shot learning (ZSL) is a machine learning method that allows a model to recognize and categorize objects or concepts that it has never seen before without any prior training or instances of those specific objects or concepts. This type of learning method is significant for autonomous systemsAutonomous systems are self-operating entities capable of performing tasks or making decisions without direct human intervention. that must be able to recognize and classify new objects on their own.

In traditional supervised learning, a model is trained using a dataset with labels for all the classes it needs to classify. However, zero-shot learning extends this capability by allowing a model to generalize to previously unseen classes using extra information or knowledge.

Example

Zero-shot learning leverages the inherent skill of humans to perceive resemblances between various categories of data, enabling it to make accurate predictions even without prior training on the specific classes. Humans can recognize that both cows and deer possess horns, walk on four legs, and share other common visual traits. This capability allows them to establish connections between new or unfamiliar categories and previously encountered visual concepts.

Suppose a model has been trained to recognize many dog breeds but has never seen a specific type, such as a "Komondor." In that case, ZSL allows the model to identify and categorize the Komondor based on its knowledge of other dog breeds. Like humans, ZSL relies on utilizing existing knowledge to accomplish its goals.

Working mechanism

Here's a basic flowchart that shows the working mechanism of zero-shot learning:

Flowchart illustrating the working mechanism of ZSL
Flowchart illustrating the working mechanism of ZSL

1.Dataset preparation

In ZSL, the data is categorized into three kinds:

  • Seen classes: These are the data classes used to train the deep learning model.

  • Unseen classes: These are the classes that the model must be capable of classifying without specific training. The training procedure did not incorporate any data from these classes.

  • Auxiliary information: More information is necessary to handle the ZSL challenge due to the lack of labeled instances in the unseen classes. This additional information should include details on all unseen classes, such as descriptions, semantic details, or embedding of words.

This step aims to collect or make a dataset that includes labeled examples from a group of seen classes. This dataset should also have extra information about the classes that will be helpful for zero-shot learning.

2. Training phase

During this phase, a model acquires knowledge by studying a set of data samples that have been appropriately labeled. This phase generally consists of two steps:

  • Feature extraction: The input data, such as images or text, is transformed into meaningful features in this phase.

  • Model training: In this phase, the model learns to map the extracted features to the corresponding class labels.

3. Inference phase

During this phase, the model applies the acquired knowledge and additional information to categorize a new set of classes. This phase also generally consists of two steps:

  • Semantic representation: The auxiliary information represents the semantic properties of unseen classes.

  • Class prediction: During inference, the model takes the features extracted from unseen examples and compares them with the semantic representations of the unseen classes. The model predicts the class label by finding the closest match between the features and the semantic representations.

Note: Some variations and techniques exist within zero-shot learning, and the exact implementation may vary depending on the specific approach or algorithm being used. The flowchart above provides a general overview of the process.

Evaluation metrics

Here are some commonly used evaluation metrics in zero-shot learning:

ZSL evaluation metrics
ZSL evaluation metrics
  • Top-1 accuracy: This metric measures the percentage of test instances correctly classified into their respective unseen classes.

  • Top-k accuracy: Besides the top-1 accuracy, this metric measures the percentage of test instances where the correct label appears in the top-k predicted labels.

  • Harmonic generalized precision (HGP): This metric measures how well the model assigns the correct label to unseen classes while considering the ordering of the predicted classes.

  • Harmonic generalized recall (HGR): This metric measures how well the model retrieves instances of unseen classes while considering the ordering of the predictions.

  • Mean average precision (mAP): This metric is commonly used in information retrieval tasks and measures the average precision for each class. It considers the precision and recall of the model's predictions across different thresholds and calculates the average across all classes.

  • Area under the curve (AUC): This metric measures the model's ability to rank the unseen classes correctly. It plots the true positive rate against the false positive rate and calculates the area under this curve.

Note: The choice of evaluation metric depends on the specific zero-shot learning task and the study's objectives.

ZSL vs. FSL vs. OSL

Let's compare how zero-shot, few-shot and one-shot learning differ. Here are several key differentiators:

Zero-Shot Learning (ZSL)

Few-Shot Learning (FSL)

One-Shot Learning (OSL)

It does not require any labeled examples for the unseen classes.

It uses a small number of labeled examples per class.

It uses a single labeled example per class.

It focuses on generalizing to unseen classes based on auxiliary information or semantic relationships.

It aims to achieve better generalization by leveraging information from multiple examples.

It focuses on capturing similarities between instances to make predictions.

It relies on transferring knowledge from known classes to unseen classes.

It aims to learn from limited labeled examples to generalize new instances or tasks well.

It emphasizes capturing instance similarities.

It is beneficial when dealing with scenarios where instances must be classified into unseen or new classes that were not present during training.

It applies in scenarios with only a few labeled examples per class.

It is particularly relevant when there is an extreme scarcity of labeled examples per class, typically having only one labeled example available.

It finds applications in areas where new classes or concepts constantly emerge, such as fine-grained categorization, object recognition, natural language processing tasks, or cross-domain adaptation.


It finds applications in tasks such as image recognition, object detection, text classification, and speech recognition, where data scarcity or data annotation costs limit the availability of labeled examples.

It finds applications in various fields, including face recognition, signature verification, character recognition, and anomaly detection, where acquiring large amounts of labeled data is impractical or expensive.

Advantages

The utilization of zero-shot learning offers numerous advantages, outlined below:

  • Generalization to unseen classes

  • Reduced annotation efforts

  • Flexibility to handle evolving datasets

  • Enhanced scalability by avoiding retraining from scratch

  • Ability to leverage semantic relationships between classes

  • Improved efficiency in adapting to new tasks or domains

  • Facilitation of transfer learning and knowledge transfer between related tasks or domains

  • Reduction of data bias by avoiding overfitting to specific classes during the training

  • Support for multilabel classification tasks by predicting multiple class labels simultaneously

Disadvantages

There are several drawbacks associated with zero-shot learning, outlined as follows:

  • Limited generalization

  • Lack of fine-grained discrimination

  • Dependency on the accurate attribute information

  • Difficulty in scaling to large class spaces

  • Vulnerability to semantic noise

  • Bias and cultural influences

  • Limited ability to handle interclass relationships

Try it yourself

The column on the left lists the different learning methods, and the column on the right lists the example scenario of each learning method. Try matching the scenarios valid for each method.

Match The Answer
Select an option from the left-hand side

Zero-shot learning (ZSL)

Given a single sample of a person’s signature, the system must determine whether a new signature belongs to the same person.

Few-shot learning (FSL)

The system must do accurate translations in machine translation when translating a language pair with no parallel training data.

One-shot learning (OSL)

After training a model with a few handwritten examples, a system must accurately recognize and transcribe new samples of an individual’s handwriting.


Free Resources

Copyright ©2025 Educative, Inc. All rights reserved