Image Segmentation Using Transformers
Discover transformer applications in semantic segmentation and explore SETR and segmenter architectures.
Let's explore the application of transformers in semantic segmentation. Traditional encoder-decoder architectures pose computational challenges. We'll explore a transformative approach by incorporating transformers for image segmentation.
Encoder-decoder architecture with self-attention
In an encoder-decoder setup, replacing the encoder block with a self-attention mechanism is a viable option. However, the computational cost is a concern. Two solutions were discussed: multihead attention parallelization or utilizing image patches/words, similar to vision transformers (ViT).
Architectures combining approaches
Several architectures seamlessly integrate both approaches. Let's examine two notable models: SEgmentation TRansformers (SETR), and Segmenter.
SETR architecture
The SETR model employs a semantic segmentation transformer that divides the image into patches. The encoder operates on image patch embeddings with positional embeddings, using self-attention.
Get hands-on with 1400+ tech skills courses.