...
/Basics of the Convolutional Neural Network (CNN): Part I
Basics of the Convolutional Neural Network (CNN): Part I
Learn about the fundamentals of a CNN, a building block for YOLO.
A convolutional neural network (CNN) tries to detect multiple patterns in different regions of an image using a receptive field, which is the area that a neuron sees when processing data. Here are some key features of a CNN:
A CNN is a special type of neural network—generally used for image data—that can extract features from an image so that the computer can identify its content.
The intuition behind CNN is to reduce the input size while increasing the depth (equal to the number of channels) in the network.
A CNN uses convolution instead of general matrix multiplication.
Instead of feeding pixels to a neural network, we feed features to CNN.
Why are CNNs needed over ANNs for images?
CNNs are preferred over
Problems with neural networks
Let’s learn in detail why CNNs perform better than ANNs for image data.
Rotation/position invariance
Images are just pixel values for computers. A neural network (NN) flattens out the image representation—converts it into a 1D array. The issue with this approach is that the spatial information of images is lost in this conversion. So, if we train a model with the data of a person at a specific location, an NN will fail to identify the person at another location in the same image.
Not scalable
Neural networks follow a fully connected structure, meaning each input is connected to all the neurons in the next layer. This significantly increases the number of weights to be trained. For instance, let’s suppose we have an RGB image with dimensions of 8 × 8 pixels. In an NN, each pixel serves as an input to the network, resulting in a total of
But, if we want to take a larger image of 150 × 150 pixels for our model, the total number of inputs now will be
How do CNNs solve these problems?
CNNs use shared weights to control the number of parameters. The intuition behind this is that if a filter is useful for finding features at coordinates (