What is object detection (OD)?

Object detection (OD) is an important technique in computer vision that helps us answer the following questions:

  • Identification: Which object is present in an image?

  • Localization: Where is the object located in an image?

Humans can easily identify any familiar object in an image just by looking at it. However, a computer does not know if a given image is of a person or a mobile. It sees the image as pixels, where the pixel values range between 0–255 and represent the intensity of the color for a given pixel.

Object detection helps a computer understand the contents of an image by analyzing features or patterns extracted by the model.

What humans see
What humans see
What a computer sees
What a computer sees

Object recognition vs. object detection

Press + to interact
Questions answered by image classification and object detection
Questions answered by image classification and object detection
1 / 2
Questions answered by image classification and object detection

High-level overview of the object detection steps

  1. Input image: An input image is passed to the OD model.

  2. Identifying interesting regions: The model tries to identify regions that may contain an object. This is done by looking for areas of the image that may stand out, such as bright colors or unique textures.

  3. Extracting features: The next step is to extract features, such as color or texture, from these areas.

  4. Object classification: Now, we need to know what type of object is present in each region. This is done by training a classification algorithm.

  5. Localization: Once we have identified what objects are present in a region, we need to accurately locate them.

  6. Loss function: After all these steps, we must check if our model performs well. How do we do that? We define a loss function that penalizes the differences between predicted and original boxes (ground-truth boxes) and class predictions. This helps our model to make better predictions.

Challenges with object detection

  • Occlusion: In a crowded space, such as a road, objects may be partially or completely occluded by other elements in the scene. This can make it difficult for a model to detect and localize objects.

  • Lightning variation: Variations in lighting, such as due to the sun or different periods in a day, can make objects appear lighter or darker. This can make it difficult for OD.

  • Less variation in data: Training an accurate model requires a large amount of data. It can be difficult to obtain enough diverse data for rare events/objects, such as detecting a UFO (unidentified flying object).

  • False positives: The OD model may detect objects that are not present in an image.

  • False negatives: Because of the complexity of the task, the model may not be able to detect an object present in an image.

  • Scale variance: Objects may be present at different scales within an image, making detection difficult.