Home/Blog/Machine Learning/How to use convolutional neural networks (CNNs) for images
Home/Blog/Machine Learning/How to use convolutional neural networks (CNNs) for images

How to use convolutional neural networks (CNNs) for images

9 min read
Jan 01, 2025
content
What are convolutional neural networks?
Building blocks of a convolutional neural network
How convolutional layers work: The core of CNNs
Understanding pooling layers in CNNs
Fully connected layers and classification in CNNs
How CNNs learn: The process of training and backpropagation
How to build a simple CNN in Python with TensorFlow
Real-world applications of CNNs
Image classification
Facial recognition
Object detection
Medical imaging
Famous CNN models powering AI magic: LeNet, AlexNet, and beyond
Advantages and limitations of CNNs
Advantages:
Limitations:
When to use CNNs and when not to
When to use CNNs:
When not to use CNNs:
Why CNNs matter in modern machine learning

Key Takeaways:

  • CNNs are designed to process grid-like data such as images, making them ideal for tasks like image classification and object detection.

  • TensorFlow is a popular open-source Python library for machine learning that is widely used to design and train CNNs efficiently.

  • CNNs have three major layers: convolutional, pooling and fully connected layers. These serve to extract features, retain essential information, and make predictions.

  • CNNs are resource-intensive and require large datasets to perform well and avoid overfitting.

  • CNNs shine in image and video tasks but are not always the best choice for tabular data or natural language processing (NLP), where other models excel.

Convolutional neural networks (CNNs) are at the heart of cutting-edge technologies, powering everything from facial recognition to self-driving cars. But how do they work? If you’ve ever wondered how your phone recognizes a face or how a self-driving car identifies road signs, this guide will break down CNNs in a way that’s easy to understand, even if you’re just starting with machine learning. You’ll also learn how to implement a simple CNN along the way and see these concepts come to life.

What are convolutional neural networks?#

Convolution neural networks (CNNs) are a specialized type of neural network designed to process grid-like data, such as images. So, what sets CNNs apart from other neural networks? Unlike traditional neural networks, CNNs excel at identifying patterns—like recognizing the edges of a cat’s whiskers or the textures on a pizza. They automate feature extraction, removing the need for manual intervention. In short, CNNs handle the complexities of pattern recognition with ease.

Building blocks of a convolutional neural network#

A CNN consists of several layers, each playing a distinct role. If you think of CNNs as a layered cake, each layer adds more depth to the taste. Here’s a quick breakdown of the key layers:

  • Convolutional layers: The powerhouse of the network, responsible for the feature detection.

  • Pooling layers: These reduce the size of the data while preserving important features.

  • Fully connected layers: These layers are where the final prediction happens.

The layers in convolutional neural network
The layers in convolutional neural network

Now, let’s explore each of these layers in detail.

How convolutional layers work: The core of CNNs#

Imagine you have an image of a cat. The convolutional layer acts like a detective, scanning the image with filters that detect features such as edges or textures. Each filter helps generate a feature map, highlighting crucial parts of the image. As these filters move across the image, they detect important details—like the cat’s eyes or the sharpness of its whiskers. Essentially, the convolutional layer identifies patterns by applying mathematical operations called convolutional filters to the image.

Understanding pooling layers in CNNs#

Once the convolutional layers have identified key features, the pooling layer steps in to simplify things. Pooling reduces the size of the feature maps, retaining only the most significant information. The most common pooling method is max pooling, which retains the maximum value from each region (or patch) within the feature map.

For example, when analyzing an image of a cat, the pooling layer would focus on prominent details like the cat’s outline, disregarding less important information.

Fully connected layers and classification in CNNs#

In the final stage, the fully connected layers take over. After the data has passed through the convolutional and pooling layers, it is flattened into a one-dimensional array. The fully connected layers use this data to classify the image. For instance, the network might predict with 90% certainty that the image is of a cat, 8% that it’s a dog, and 2% that it’s something else. This is where CNN makes its final decision.

How CNNs learn: The process of training and backpropagation#

So how do CNNs actually learn? The process is known as training, where the network is fed labeled data (like images of cats and dogs). Initially, its predictions are off the mark, but through a process called backpropagation, it adjusts. Here’s how it works: CNNs start by making random predictions. Through backpropagation, a process that adjusts the model’s weights based on error rates, CNNs gradually “learn” by minimizing errors over many training rounds. This iterative process continues until the network becomes highly skilled at recognizing patterns, similar to how one gradually learns a new skill.

How to build a simple CNN in Python with TensorFlow#

Let’s walk through the process of building a basic CNN in Python using TensorFlow and Keras. TensorFlow and Keras are commonly used for building CNNs due to their power and simplicity. TensorFlow, an open-source library, efficiently handles complex mathematical operations and large datasets, making it ideal for deep learning tasks. Keras (now integrated with TensorFlow) provides a high-level API that simplifies model creation and training, allowing for rapid prototyping and experimentation. Together, they offer an efficient and user-friendly environment for developing CNNs with minimal code complexity.

The following is a concise introduction to creating your own CNN. For this example, we’re using the CIFAR-10 dataset, which contains 60,000 color images, each 32x32 pixels with three color channels (RGB). The images belong to 10 different classes, making it a popular choice for image classification tasks.

# Import libraries for building CNN with TensorFlow
import tensorflow as tf
from tensorflow.keras import layers, models
# Load and preprocess the data (We'll use the CIFAR-10 dataset)
(train_images, train_labels), (test_images, test_labels) = tf.keras.datasets.cifar10.load_data()
# Normalize pixel values (convert those pixels to numbers that CNN can understand)
train_images, test_images = train_images / 255.0, test_images / 255.0
# Build the CNN model
model = models.Sequential([
layers.Conv2D(32, (3, 3), activation='relu', input_shape=(32, 32, 3)),
layers.MaxPooling2D((2, 2)),
layers.Conv2D(64, (3, 3), activation='relu'),
layers.MaxPooling2D((2, 2)),
layers.Conv2D(64, (3, 3), activation='relu'),
layers.Flatten(),
layers.Dense(64, activation='relu'),
layers.Dense(10, activation='softmax')
])
# Compile and train the model
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
model.fit(train_images, train_labels, epochs=10, validation_data=(test_images, test_labels))
Code for a simple CNN

Here’s a breakdown of what we’re doing in this code:

  • Importing libraries: We load TensorFlow and specific Keras modules to build and train our CNN.

  • Loading and preprocessing data: We load the CIFAR-10 dataset and normalize pixel values by dividing by 255, scaling them to a range between 0 and 1, which helps the CNN process the data effectively.

  • Building the CNN model: We create a sequential CNN architecture by providing a list of layers. 

    • The Conv2D layer extracts features from the input image using a 3x3 filter (specified by the (3, 3) argument). The input image is assumed to have dimensions 32x32 with 3 color channels.

    • The MaxPooling2D layers reduce the spatial dimensions by applying a 2x2 pooling window ((2, 2)). 

    • Note that the first and second convolutional layers here use 32 and 64 filters, respectively. More filters enable the model to learn more complex features as it goes deeper into the network. 

    • The Flatten layer reshapes the 2D output into a 1D vector. 

    • The Dense layer with 64 units applies a fully connected layer, using the ReLU activation function. 

    • Finally, the last Dense layer with 10 units outputs the classification results, using the softmax activation for multi-class classification.

  • Compiling: We compile the model using the Adam optimizer and sparse categorical cross-entropy loss, which is ideal for multi-class classification with integer labels.

  • Training: Finally, we train the model for 10 epochs on the CIFAR-10 dataset, using both training and test data to evaluate its performance.

Now that you know how to build a basic CNN, take the next step with this TensorFlow project to classify images of cats and dogs using a CNN.

Now that you know how to build a basic CNN, take the next step with this TensorFlow project to classify images of cats and dogs using a CNN.

Real-world applications of CNNs#

CNNs are widely used in various industries today, powering a range of applications that are making an impact in the real world. Let’s look at a few key areas where CNNs lead the way. Here are a few notable applications:

Convolutional neural network applications
Convolutional neural network applications

Image classification#

One of the most well-known applications of CNNs is image classification. Have you ever wondered how search engines are able to categorize millions of images with ease? That’s where CNNs come in. They help these platforms automatically sort, tag, and organize images into categories based on their content. Whether you’re searching for cats, cars, or cupcakes, CNNs make sure you get relevant results in no time.

Facial recognition#

Facial recognition technology, a feature many of us use daily, also relies on CNNs. Whether you’re unlocking your phone or passing through airport security, CNNs are behind the facial recognition systems that scan and verify identities. You’ve probably also noticed that your phone’s gallery automatically organizes photos based on the people in it. That is also the CNN at work. These systems analyze unique facial features and compare them with stored data, providing a secure and efficient way to authenticate users.

Object detection#

CNNs also play a crucial role in object detection, as seen in Amazon Go stores, where they track items customers pick up or return, allowing for a seamless shopping experience without the need for checkout lines. CNNs are specifically vital in the space of self-driving cars. These neural networks help vehicles detect and differentiate between various objects on the road, such as pedestrians, stop signs, obstacles, and other vehicles. By processing the visual data from cameras, CNNs enable cars to make real-time decisions in a short time, contributing to a safer and smarter autonomous driving experience.

Object detection in images
Object detection in images

Medical imaging#

In the healthcare sector, CNNs are making a big difference in medical imaging. They assist doctors by analyzing scans, such as MRIs or CT scans, to identify abnormalities like tumors, often with higher accuracy than traditional methods. CNNs can help detect subtle patterns in the images, offering an extra layer of precision in diagnosing diseases and potentially saving lives by catching issues earlier.

Curious to explore how CNNs are applied in medical imaging? Build a project that automates diagnosis by classifying medical images.

Curious to explore how CNNs are applied in medical imaging? Build a project that automates diagnosis by classifying medical images.

Famous CNN models powering AI magic: LeNet, AlexNet, and beyond#

Several famous CNN architectures have paved the way for advancements in AI. From LeNet and AlexNet to more recent innovations, these models have revolutionized areas like facial recognition and autonomous driving. Let’s look at some famous CNN models and see how they’ve influenced AI advancements:

Model

Year Introduced

Key Features

Famous For

LeNet

1989

Shallow network, 5 layers

Handwritten digit recognition

AlexNet

2012

8 layers, ReLU activation, large image dataset

Winning the ImageNet competition in 2012

VGGNet

2014

16–19 layers, small 3x3 filters

Simple yet effective for image classification

2015

Residual connections, over 100 layers

Solving the vanishing gradient problem

Inception

2014

Inception modules (multi-scale feature extraction)

Efficient image classification with less computation

MobileNet

2017

Lightweight architecture for mobile devices

Real-time mobile applications, facial recognition

DenseNet

2017

Dense connections between layers

Reduced parameters and improved efficiency

EfficientNet

2019

Scalable CNN architecture (EfficientNet-B0 to B7)

SOTA (state-of-the-art) accuracy with fewer resources

ConvNeXt

2022

Modernized CNN architecture with large kernel sizes

Competing with Vision Transformers (ViTs) in vision tasks

Advantages and limitations of CNNs#

CNNs have impressive capabilities when it comes to images, but they also come with their own set of challenges.

Advantages:#

  • Automated feature extraction: CNNs handle feature detection without manual input.

  • Scalability: They excel at processing large, complex images.

  • High performance: CNNs automatically learn complex spatial patterns, making them highly effective for image analysis.

Limitations:#

  • Resource intensive: Training CNNs can be computationally expensive, requiring powerful hardware.

  • Data hungry: CNNs perform best with large datasets.

  • Overfitting: Without careful tuning, CNNs may struggle to generalize beyond their training data.

When to use CNNs and when not to#

CNNs are powerful, but they’re not the solution for every task. Let’s explore when they shine and when other methods may be more appropriate.

When to use CNNs:#

  • Image data: CNNs are ideal for image classification, object detection, and segmentation.

  • Video recognition: CNNs can handle tasks such as tracking objects across video frames.

  • Complex computer vision tasks: For tasks requiring deep visual understanding, CNNs are a top choice.

When not to use CNNs:#

  • Tabular data: Traditional models like decision trees or logistic regression are more suitable for structured data.

  • Natural language processing (NLP): While CNNs can handle text, models like transformers (e.g., BERT or GPT) are more effective for language-related tasks.

  • Small datasets: CNNs need large datasets to perform well. For smaller datasets, simpler models often yield better results.

Why CNNs matter in modern machine learning#

CNNs are at the frontier of AI-driven innovation, enabling everything from medical diagnoses to autonomous driving. Now that you’ve explored the basics, dive deeper, experiment, and build—who knows, the next big breakthrough in CNNs could be yours!

If you feel ready to take the next step, why not dive into graph convolutional networks (GCNs)?

If you feel ready to take the next step, why not dive into graph convolutional networks (GCNs)?

Frequently Asked Questions

Is CNN deep learning?

Deep learning includes models with many layers that automatically learn patterns from data, and CNNs are designed to work with images by learning features like edges and shapes through multiple layers. So, yes, CNN is deep learning.

Do CNNs need a lot of data to perform well?

How is CNN different from other neural networks?

What is the difference between ANN and CNN?

What is the difference between RNN and CNN?


Written By:
Hamna Waseem
Join 2.5 million developers at
Explore the catalog

Free Resources