...

/

Image Segmentation Using Transformers

Image Segmentation Using Transformers

Discover transformer applications in semantic segmentation and explore SETR and segmenter architectures.

Let's explore the application of transformers in semantic segmentation. Traditional encoder-decoder architectures pose computational challenges. We'll explore a transformative approach by incorporating transformers for image segmentation.

Encoder-decoder architecture with self-attention

In an encoder-decoder setup, replacing the encoder block with a self-attention mechanism is a viable option. However, the computational cost is a concern. Two solutions were discussed: multihead attention parallelization or utilizing image patches/words, similar to vision transformers (ViT).

Architectures combining approaches

Several architectures seamlessly integrate both approaches. Let's examine two notable models: SEgmentation TRansformers (SETR), and Segmenter.

SETR architecture

The SETR model employs a semantic segmentation transformer that divides the image into patches. The encoder operates on image patch embeddings with positional embeddings, using self-attention.

Press + to interact
SETR architecture
SETR architecture

While it isn't a pure transformer model, the encoder operates on image patch embeddings with positional embeddings, utilizing self-attention to create the image's encoder representation. On the other hand, the decoder is a conventional convolutional decoder, such as the "SETR-Naive" model, ...

Access this course and 1400+ top-rated courses and projects.