...

/

Encoder-Decoder Design Pattern

Encoder-Decoder Design Pattern

Learn about the fundamental concepts of computer vision, including the fundamental component: encoder-decoder.

Before we dive in, let's recap what we aim to achieve in computer vision (CV).

What do we want from computer vision?

In general, CV tasks fall into two categories: image analysis and image synthesis. This course primarily focuses on the analysis aspect.

Press + to interact
Image analysis
Image analysis

We'll study the distinctions between various CV tasks, such as image classification, object detection (whether single or multiple object detection), and different types of segmentation like instance segmentation, semantic segmentation, and panoptic segmentation, later.

As we progress, we'll explore how transformers handle each of these tasks and discuss the current state of transformer models for each.

Press + to interact
Face recognition
Face recognition

Our objective is to observe how all these tasks share a common model design pattern. This pattern includes a feature extractor or backbone followed by a decision layer that produces an output.

Press + to interact
Universal encoder-decoder design pattern
Universal encoder-decoder design pattern

For instance, in multiclass classification or face recognition, we use certain layers, which can be convolutional or fully connected. These layers encode the features, and their weights adapt to the model type. Following this, we have a softmax layer at the output, responsible for identifying a person in an image. The softmax output is typically a linear combination of the output features from the feature block, and these weights are trainable. We then get a score for each class and determine the softmax output to produce a final output using the arg max\text{arg max} operation.

This encoder-decoder design pattern is also applicable beyond classification. It can be employed in various CV tasks. In machine learning (ML), the features can either be handcrafted or learnable, as we know from the basics of ML and deep learning (DL).

The universal encoder-decoder architecture

This design pattern is universal and serves as a master architecture in DL, not limited to classification but extending to other CV tasks. For instance, in object detection, we can have multiple decoders for different aspects, such as object class and bounding box coordinates. For segmentation, we might use an inverted decoder, as seen in the UNet architecture, where we have convolution in the encoder, deconvolution in the decoder, and the input and output of the same size.

Press + to interact
Encoder-decoder architecture for different data types
Encoder-decoder architecture for different data types

This encoder-decoder design pattern is not exclusive to CV. We find it in neural machine translation as well, with an encoder that digests the input sequence and a decoder that generates the output based on the encoder's information. This same concept applies to transformer models.

Key idea: Encoder-decoder architecture

To generalize, the encoder-decoder architecture involves two main stages: the ...

Access this course and 1400+ top-rated courses and projects.