Object Detection
Learn about the YOLO architecture and the working of VGG16.
We'll cover the following
YOLO architecture
We have so far only discussed object recognition. In many applications, we want to go further and also tell where the objects are in the picture. For example, for self-driving cars, we want to know where pedestrians are or where the road is. One way of doing this is to place bounding boxes around the objects, as shown in the figure below. A popular architecture for this is called YOLO (You Only Look Once). The idea is thereby to train a network not only on single labels but also on the location , the size of a bounding box, and its confidence.
The network does this by dividing an image into an array of grid cells of size , where is set to in the original example. The network makes number of predictions of the five numbers mentioned earlier for each bounding box ( in the original example) so that we need output nodes. Here, is the number of classes, which was in the dataset in the original paper, hence the output shape of .
Get hands-on with 1400+ tech skills courses.