Home/Blog/Machine Learning/How to use convolutional neural networks (CNNs) for images

How to use convolutional neural networks (CNNs) for images

9 min read

Jan 01, 2025

content

What are convolutional neural networks?

Building blocks of a convolutional neural network

How convolutional layers work: The core of CNNs

Understanding pooling layers in CNNs

Fully connected layers and classification in CNNs

How CNNs learn: The process of training and backpropagation

How to build a simple CNN in Python with TensorFlow

Real-world applications of CNNs

Image classification

Facial recognition

Object detection

Medical imaging

Famous CNN models powering AI magic: LeNet, AlexNet, and beyond

Advantages and limitations of CNNs

Advantages:

Limitations:

When to use CNNs and when not to

When to use CNNs:

When not to use CNNs:

Why CNNs matter in modern machine learning

Key Takeaways:

CNNs are designed to process grid-like data such as images, making them ideal for tasks like image classification and object detection.
TensorFlow is a popular open-source Python library for machine learning that is widely used to design and train CNNs efficiently.
CNNs have three major layers: convolutional, pooling and fully connected layers. These serve to extract features, retain essential information, and make predictions.
CNNs are resource-intensive and require large datasets to perform well and avoid overfitting.
CNNs shine in image and video tasks but are not always the best choice for tabular data or natural language processing (NLP), where other models excel.

Convolutional neural networks (CNNs) are at the heart of cutting-edge technologies, powering everything from facial recognition to self-driving cars. But how do they work? If you’ve ever wondered how your phone recognizes a face or how a self-driving car identifies road signs, this guide will break down CNNs in a way that’s easy to understand, even if you’re just starting with machine learning. You’ll also learn how to implement a simple CNN along the way and see these concepts come to life.

What are convolutional neural networks?#

Convolution neural networks (CNNs) are a specialized type of neural network designed to process grid-like data, such as images. So, what sets CNNs apart from other neural networks? Unlike traditional neural networks, CNNs excel at identifying patterns—like recognizing the edges of a cat’s whiskers or the textures on a pizza. They automate feature extraction, removing the need for manual intervention. In short, CNNs handle the complexities of pattern recognition with ease.

Building blocks of a convolutional neural network#

A CNN consists of several layers, each playing a distinct role. If you think of CNNs as a layered cake, each layer adds more depth to the taste. Here’s a quick breakdown of the key layers:

Convolutional layers: The powerhouse of the network, responsible for the feature detection.
Pooling layers: These reduce the size of the data while preserving important features.
Fully connected layers: These layers are where the final prediction happens.

Now, let’s explore each of these layers in detail.

How convolutional layers work: The core of CNNs#

Imagine you have an image of a cat. The convolutional layer acts like a detective, scanning the image with filters that detect features such as edges or textures. Each filter helps generate a feature map, highlighting crucial parts of the image. As these filters move across the image, they detect important details—like the cat’s eyes or the sharpness of its whiskers. Essentially, the convolutional layer identifies patterns by applying mathematical operations called convolutional filters to the image.

Understanding pooling layers in CNNs#

Once the convolutional layers have identified key features, the pooling layer steps in to simplify things. Pooling reduces the size of the feature maps, retaining only the most significant information. The most common pooling method is max pooling, which retains the maximum value from each region (or patch) within the feature map.

For example, when analyzing an image of a cat, the pooling layer would focus on prominent details like the cat’s outline, disregarding less important information.

Fully connected layers and classification in CNNs#

In the final stage, the fully connected layers take over. After the data has passed through the convolutional and pooling layers, it is flattened into a one-dimensional array. The fully connected layers use this data to classify the image. For instance, the network might predict with 90% certainty that the image is of a cat, 8% that it’s a dog, and 2% that it’s something else. This is where CNN makes its final decision.

How CNNs learn: The process of training and backpropagation#

So how do CNNs actually learn? The process is known as training, where the network is fed labeled data (like images of cats and dogs). Initially, its predictions are off the mark, but through a process called backpropagation, it adjusts. Here’s how it works: CNNs start by making random predictions. Through backpropagation, a process that adjusts the model’s weights based on error rates, CNNs gradually “learn” by minimizing errors over many training rounds. This iterative process continues until the network becomes highly skilled at recognizing patterns, similar to how one gradually learns a new skill.

How to build a simple CNN in Python with TensorFlow#

Let’s walk through the process of building a basic CNN in Python using TensorFlow and Keras. TensorFlow and Keras are commonly used for building CNNs due to their power and simplicity. TensorFlow, an open-source library, efficiently handles complex mathematical operations and large datasets, making it ideal for deep learning tasks. Keras (now integrated with TensorFlow) provides a high-level API that simplifies model creation and training, allowing for rapid prototyping and experimentation. Together, they offer an efficient and user-friendly environment for developing CNNs with minimal code complexity.

The following is a concise introduction to creating your own CNN. For this example, we’re using the CIFAR-10 dataset, which contains 60,000 color images, each 32x32 pixels with three color channels (RGB). The images belong to 10 different classes, making it a popular choice for image classification tasks.

# Import libraries for building CNN with TensorFlow
import tensorflow as tf
from tensorflow.keras import layers, models
# Load and preprocess the data (We'll use the CIFAR-10 dataset)
(train_images, train_labels), (test_images, test_labels) = tf.keras.datasets.cifar10.load_data()
# Normalize pixel values (convert those pixels to numbers that CNN can understand)
train_images, test_images = train_images / 255.0, test_images / 255.0
# Build the CNN model
model = models.Sequential([
    layers.Conv2D(32, (3, 3), activation='relu', input_shape=(32, 32, 3)),
    layers.MaxPooling2D((2, 2)),
    layers.Conv2D(64, (3, 3), activation='relu'),
    layers.MaxPooling2D((2, 2)),
    layers.Conv2D(64, (3, 3), activation='relu'),
    layers.Flatten(),
    layers.Dense(64, activation='relu'),
    layers.Dense(10, activation='softmax')
])
# Compile and train the model
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
model.fit(train_images, train_labels, epochs=10, validation_data=(test_images, test_labels))

Code for a simple CNN

Here’s a breakdown of what we’re doing in this code:

Importing libraries: We load TensorFlow and specific Keras modules to build and train our CNN.
Loading and preprocessing data: We load the CIFAR-10 dataset and normalize pixel values by dividing by 255, scaling them to a range between 0 and 1, which helps the CNN process the data effectively.
Building the CNN model: We create a sequential CNN architecture by providing a list of layers.
- The Conv2D layer extracts features from the input image using a 3x3 filter (specified by the (3, 3) argument). The input image is assumed to have dimensions 32x32 with 3 color channels.
- The MaxPooling2D layers reduce the spatial dimensions by applying a 2x2 pooling window ((2, 2)).
- Note that the first and second convolutional layers here use 32 and 64 filters, respectively. More filters enable the model to learn more complex features as it goes deeper into the network.
- The Flatten layer reshapes the 2D output into a 1D vector.
- The Dense layer with 64 units applies a fully connected layer, using the ReLU activation function.
- Finally, the last Dense layer with 10 units outputs the classification results, using the softmax activation for multi-class classification.
Compiling: We compile the model using the Adam optimizer and sparse categorical cross-entropy loss, which is ideal for multi-class classification with integer labels.
Training: Finally, we train the model for 10 epochs on the CIFAR-10 dataset, using both training and test data to evaluate its performance.

Image classification#

One of the most well-known applications of CNNs is image classification. Have you ever wondered how search engines are able to categorize millions of images with ease? That’s where CNNs come in. They help these platforms automatically sort, tag, and organize images into categories based on their content. Whether you’re searching for cats, cars, or cupcakes, CNNs make sure you get relevant results in no time.

Facial recognition#

Facial recognition technology, a feature many of us use daily, also relies on CNNs. Whether you’re unlocking your phone or passing through airport security, CNNs are behind the facial recognition systems that scan and verify identities. You’ve probably also noticed that your phone’s gallery automatically organizes photos based on the people in it. That is also the CNN at work. These systems analyze unique facial features and compare them with stored data, providing a secure and efficient way to authenticate users.

Object detection#

CNNs also play a crucial role in object detection, as seen in Amazon Go stores, where they track items customers pick up or return, allowing for a seamless shopping experience without the need for checkout lines. CNNs are specifically vital in the space of self-driving cars. These neural networks help vehicles detect and differentiate between various objects on the road, such as pedestrians, stop signs, obstacles, and other vehicles. By processing the visual data from cameras, CNNs enable cars to make real-time decisions in a short time, contributing to a safer and smarter autonomous driving experience.

Model	Year Introduced	Key Features	Famous For
LeNet	1989	Shallow network, 5 layers	Handwritten digit recognition
AlexNet	2012	8 layers, ReLU activation, large image dataset	Winning the ImageNet competition in 2012
VGGNet	2014	16–19 layers, small 3x3 filters	Simple yet effective for image classification
ResNet	2015	Residual connections, over 100 layers	Solving the vanishing gradient problem
Inception	2014	Inception modules (multi-scale feature extraction)	Efficient image classification with less computation
MobileNet	2017	Lightweight architecture for mobile devices	Real-time mobile applications, facial recognition
DenseNet	2017	Dense connections between layers	Reduced parameters and improved efficiency
EfficientNet	2019	Scalable CNN architecture (EfficientNet-B0 to B7)	SOTA (state-of-the-art) accuracy with fewer resources
ConvNeXt	2022	Modernized CNN architecture with large kernel sizes	Competing with Vision Transformers (ViTs) in vision tasks

Advantages and limitations of CNNs#

CNNs have impressive capabilities when it comes to images, but they also come with their own set of challenges.

Advantages:#

Automated feature extraction: CNNs handle feature detection without manual input.
Scalability: They excel at processing large, complex images.
High performance: CNNs automatically learn complex spatial patterns, making them highly effective for image analysis.

Limitations:#

Resource intensive: Training CNNs can be computationally expensive, requiring powerful hardware.
Data hungry: CNNs perform best with large datasets.
Overfitting: Without careful tuning, CNNs may struggle to generalize beyond their training data.

When to use CNNs and when not to#

CNNs are powerful, but they’re not the solution for every task. Let’s explore when they shine and when other methods may be more appropriate.

When to use CNNs:#

Image data: CNNs are ideal for image classification, object detection, and segmentation.
Video recognition: CNNs can handle tasks such as tracking objects across video frames.
Complex computer vision tasks: For tasks requiring deep visual understanding, CNNs are a top choice.

When not to use CNNs:#

Tabular data: Traditional models like decision trees or logistic regression are more suitable for structured data.
Natural language processing (NLP): While CNNs can handle text, models like transformers (e.g., BERT or GPT) are more effective for language-related tasks.
Small datasets: CNNs need large datasets to perform well. For smaller datasets, simpler models often yield better results.

Why CNNs matter in modern machine learning#

CNNs are at the frontier of AI-driven innovation, enabling everything from medical diagnoses to autonomous driving. Now that you’ve explored the basics, dive deeper, experiment, and build—who knows, the next big breakthrough in CNNs could be yours!

Frequently Asked Questions

Is CNN deep learning?

Deep learning includes models with many layers that automatically learn patterns from data, and CNNs are designed to work with images by learning features like edges and shapes through multiple layers. So, yes, CNN is deep learning.

Do CNNs need a lot of data to perform well?

Yes, CNNs usually need large amounts of data to perform well and avoid overfitting—where the model memorizes the training data but doesn’t generalize well to new data. Larger datasets help CNNs learn better and improve accuracy.

How is CNN different from other neural networks?

Unlike traditional neural networks, CNNs are specially designed for processing image data and efficiently extract features through convolutional layers. They are more structured and handle spatial information better than other types of networks.

What is the difference between ANN and CNN?

ANNs are general-purpose neural networks, while CNNs are special neural networks to work on grid-like data, such as images. For more details, read this article: ANNs vs CNNs.

What is the difference between RNN and CNN?

The main difference between recurrent neural networks (RNN) and convolutional neural networks (CNN) is that RNNs are designed for sequential data (e.g., text) while CNN is optimized for grid-like data (e.g., images). Hence, CNN is better suited for image processing while RNN is a good choice for natural language processing (NLP) related tasks. For a more in-depth comparison, read this article: CNNs vs. RNN.

Written By:

Hamna Waseem