The first step in the practical use of
After object classification comes object detection. In object detection or object localization, we put bounding boxes or centroids around all the objects in the image. In object detection, two outputs are generated—labels of the objects and their bounding boxes.
After object detection, we perform semantic segmentation. In semantic segmentation, we assign a class label to every pixel. In this way, semantic segmentation allows us to detect uncountable objects like pavements, sky, and so on. Outputs for semantic segmentation are as follows:
The following is an example in which, using semantic segmentation, every pixel is assigned a class. Notice how different cats are labeled in the same way.
The idea behind instance segmentation is that one image may contain multiple instances of an object. We would like to label every instance differently.
This task is more difficult than semantic segmentation as we are no longer simply assigning a label to a pixel. Instead, we also need to differentiate between the different instances of the object.
A simple way to perform instance segmentation is to use the bounding boxes generated by object detection and then apply semantic segmentation to just that portion of the image.
In this way, we generate a binary mask of every object in the image. This binary mask has the same dimensions as that of the original image. Each binary mask has ones in place of the pixels that are included in the corresponding instance. With this approach, we can generate different binary masks for different class instances.
A binary mask of the second cat from the left is as follows:
Note: To learn more about panoptic segmentation, click here.