Home/Blog/Machine Learning/Introduction to convolutional neural networks (CNN)
Home/Blog/Machine Learning/Introduction to convolutional neural networks (CNN)

Introduction to convolutional neural networks (CNN)

Saif Ali
Jun 24, 2024
20 min read
content
Motivation
Overview of CNN architecture
1. Input layer
2. Convolutional layer
Why padding in convolution?
Padding
Stride
3. Pooling layers
4. Flatten layer
Purpose of the flatten layer
How the flatten layer works
5. Fully connected layers
Structure of fully connected layers
How fully connected layers work
Importance of fully connected layers
6. Output layer
Convolutions on RGB images
CNN for handwritten digit recognition
CNN applications
Classical pretrained models
Conclusion
Next steps
share

Become a Software Engineer in Months, Not Years

From your first line of code, to your first day on the job — Educative has you covered. Join 2M+ developers learning in-demand programming skills.

Convolutional neural networks (CNNs) have emerged as a powerful tool in the field of artificial intelligence and machine learning. Specifically designed for processing and analyzing data with a grid-like structure, CNNs excel in tasks such as image recognition, object detection (used in autonomous vehicles like Tesla’s Autopilot system), and image segmentation (applied in medical imaging for identifying tumors or abnormalities in X-ray images). Inspired by the human visual system, CNNs are capable of interpreting visual information with remarkable accuracy. This comprehensive blog aims to provide a thorough understanding of CNNs, covering their key operations, and practical implementation for multi-class image classification.

Motivation

The motivation behind the development of convolutional neural networks (CNNs) can be summarized as follows:

  • Addressing limitations: Traditional neural networks struggle with capturing spatial dependencies in image data. This is because they treat each pixel in an image as an independent feature, disregarding the spatial relationships between neighboring pixels. For instance, traditional networks might fail to recognize objects in image classification tasks if their spatial arrangement differs from the training data. In contrast, CNNs overcome this limitation with specialized layers, such as convolutional and pooling layers. These layers efficiently capture local patterns and spatial hierarchies within images.

  • Leveraging spatial information: CNNs exploit inherent spatial characteristics in images through local connections, preserving spatial relationships.

  • Automatic feature learning: CNNs automatically learn relevant features during training, adapting to diverse image variations without hand-crafted features.

  • Weight sharing for efficiency: CNNs employ weight sharing across the image, reducing parameters and enhancing generalization and computational efficiency. For example, in a convolutional layer, a filter/kernel is applied across the entire input image, and the same set of weights is used for each local receptive field. This sharing of weights allows the network to learn spatial hierarchies of features while significantly reducing the number of parameters needed to be learned, therefore improving efficiency and generalization.

  • Revolutionizing computer vision: CNNs have transformed computer vision, enabling accurate recognition of objects, patterns, and features in images, powering tasks like image recognition and object detection.

Overview of CNN architecture

CNNs have become the cornerstone of image processing and computer vision tasks, achieving remarkable success in various domains. Understanding the architecture of a CNN is crucial for harnessing its power in tasks such as image classification, object detection, and segmentation. The following example illustrates how CNNs work:

canvasAnimation-image
1 of 8

Here’s an overview of the key components that constitute a typical CNN architecture:

  1. Input layer: The input layer represents the raw pixel values of an image, and its dimensions are determined by the size of the input image. Each pixel serves as a feature for further processing.

  2. Convolutional layers: These layers consist of filters that slide over the input, extracting local patterns and features. They enable the capture of hierarchical representations, starting from simple edges to more complex structures.

  3. Pooling layers: Following the convolutional layers, pooling layers downsample the feature maps to reduce computational complexity. Techniques like max pooling and average pooling are commonly used to retain essential information.

  4. Flatten layer: The flatten layer takes the outputs from the convolutional and pooling layers and flattens them into a 1D vector. This step prepares the data for the fully connected layers.

  5. Fully connected layers: These layers are densely connected and process the flattened features for making predictions. For classification tasks, a softmax activation function is often used to produce class probability distributions.

  6. Output layer: The output layer provides the final predictions based on the learned information. The number of neurons in this layer corresponds to the number of classes in the classification task.

Now, let’s delve into a detailed exploration of each component of the CNN architecture one by one.

1. Input layer

The input layer in a CNN is the entry point for data. It receives raw information, usually images, and prepares it for processing within the network. Its main functions include:

  • Data reception: Accepting data typically represented as 3D tensors for images. Each element corresponds to a pixel value, with dimensions indicating width, height, and color channels (e.g., RGB).

  • Preprocessing: Performing tasks like normalization, resizing, and data augmentation (e.g., cropping, flipping, adding noise) to enhance training data diversity.

  • Forwarding data: Passing preprocessed data to the first convolutional layer. This layer applies filters to generate feature maps contributing to the desired output, such as image classification or object detection.

Key points to remember:

  • The input layer itself is not trainable and doesn’t have adjustable parameters during training.

  • The specific size and format of the input layer depend on the chosen CNN architecture and the data type being processed.

Understanding the role of the input layer is crucial for comprehending how CNNs receive and process information, enabling their remarkable capabilities in computer vision tasks. The input layer preserves the spatial structure of the input data. It sets the stage for subsequent layers to extract features and perform classification tasks. Preprocessing within the input layer helps prepare the data for efficient learning and representation by the neural network.

2. Convolutional layer

The convolutional layer is a fundamental building block in CNNs. It’s a specialized type of neural network designed for processing and analyzing visual data, such as images. The convolutional layer plays a crucial role in feature extraction and hierarchical representation learning.

The following illustration represents the convolution operation:

canvasAnimation-image
1 of 12

In a convolutional layer:

  • Filters (kernels): The layer consists of small filters (also known as kernels), which are small, learnable matrices. These filters slide or convolve across the input data, performing a mathematical operation called convolution.

  • Convolution operation: The convolution operation involves element-wise multiplication and summation between the filter and a local region of the input data. As the filter slides across the entire input, it captures different patterns and features.

  • Feature maps: The output of the convolution operation is a feature map. Each filter in the convolutional layer produces its own feature map, highlighting specific patterns present in the input data.

  • Parameter sharing: One key advantage of convolutional layers is parameter sharing. The same filter is used across the entire input, reducing the number of parameters in the model. This parameter sharing enhances the network’s ability to generalize and recognize patterns efficiently.

  • Hierarchical representation: By stacking multiple convolutional layers with varying filter sizes, the network learns a hierarchical representation of features. Early layers capture simple features like edges and textures. The deeper layers combine these features to recognize more complex structures.

Convolutional layers are effective in image recognition, object detection, and other tasks involving spatial relationships in data. They enable CNNs to automatically learn and extract meaningful features from input images, making them well-suited for a wide range of computer vision applications.

Why padding in convolution?

Let’s explore the issues that highlight why we often need padding:

  • Edge information loss: When we use a filter to analyze our image, it tends to focus more on the pixels in the middle and less on the ones at the edges. This overlooking of edge pixels can result in losing valuable information and intricate details located at the boundaries of the image.

  • Reduced feature map dimensions: Imagine our image as a puzzle, and the filter is trying to understand each piece. Without padding, the puzzle gets smaller with each filter application. This reduction in size is a bit like zooming in too much, potentially causing us to miss significant details around the edges of the picture.

Padding

Padding acts like a protective border added around our image before applying the filter. This additional border ensures that all pixels, including those at the edges, receive adequate attention. By preventing the loss of details, padding enables the network to comprehend the entirety of the image, not just its central part.

Here are some common padding types used in CNNs, along with short definitions:

  • Zero padding: Zero padding adds extra rows and columns of zeros around the input image, preserving its spatial dimensions. It helps maintain spatial resolution and prevents information loss at the edges of the image.

  • Same padding: The same padding pads the input image with zeros in such a way that the output feature map has the same spatial dimensions as the input image. It ensures that the convolution operation does not reduce the size of the input.

  • Valid padding: Valid padding, also known as no padding, means no padding is added to the input image. As a result, the spatial dimensions of the output feature map are reduced after the convolution operation, which can lead to information loss at the edges of the image.

These padding types are essential for controlling the spatial dimensions of feature maps and ensuring effective feature extraction during the convolutional layers of a CNN.

The following illustration demonstrates the zero-padded convolution operation:

canvasAnimation-image
1 of 28

Let’s further explore how padding becomes a helpful ally in optimizing the convolution process.

  • Preserves edge details: Padding guarantees that the edges of our image are not neglected, preserving crucial details that might otherwise be overlooked.

  • Maintains information at peripheries: It prevents our image from becoming excessively small during the filtering process, aiding in capturing valuable information at the outer regions.

Keep in mind:

  • Computational cost: Introducing extra border pixels means additional work for the computer. However, the computational cost is considered a worthy trade-off for preserving the completeness and understanding of our image.

  • Balance is key: While we strive to ensure all pixels receive attention, excessive padding may introduce its own set of challenges. Striking the right balance is essential to achieve optimal results in image processing.

Stride

Stride in convolution refers to the step size at which the convolutional filter moves across the input data. It determines how much the filter shifts or strides as it scans the input, affecting the spatial dimensions of the resulting feature maps.

The following illustration represents the strided convolution operation:

canvasAnimation-image
1 of 12

Now, let’s look at the advantages and drawbacks associated with the stride parameter:

Pros of Stride

Cons of Stride

Dimension reduction: Larger strides can reduce the spatial dimensions of feature maps, potentially saving computational resources and memory.


Information loss: However, the larger the stride, the more likely it is to skip over fine-grained details, potentially leading to information loss. This can impact the network’s ability to accurately recognize subtle features in the input data.

Increased computational efficiency: By skipping over some input pixels, strides reduce the number of operations needed during convolution, leading to faster computation.


Decreased spatial resolution: Smaller strides preserve finer details but may result in larger feature maps at the expense of reduced spatial resolution. This can affect the network’s ability to precisely localize objects or features in the input.

Enhanced feature diversity: Strides can help diversify the features extracted by a convolutional layer by ensuring that each filter covers different portions of the input, capturing a wider range of patterns.

Increased sensitivity to translation: Larger strides may make the network more sensitive to translations in the input data, making it less robust to variations in object position or orientation.

Selecting the perfect stride in a CNN is like finding the right balance. It’s about adjusting the pace of the filter movement across the image to efficiently extract features while managing computational resources. Aim for a stride that captures essential details without overburdening the system, considering the specific needs of our task and the characteristics of our data. The goal is a stride that’s adaptable, optimizing both performance and efficiency across the layers of the network.

3. Pooling layers

Pooling layers are crucial components of CNNs that are utilized primarily to reduce the dimensions (width and height, not depth) of the input volumes they process. By performing this dimensionality reduction, pooling layers help in decreasing the computational load for the network, reducing the number of parameters. Pooling layers also aid in making the detection of features somewhat invariant to scale and orientation changes. Another key benefit of pooling is that it helps in preventing overfitting by providing an abstracted form of the representation.

The most common types of pooling are:

  • Max pooling: This is the most frequently used form of pooling, where the pooling operation partitions the input image into a set of nonoverlapping rectangles and, for each such subregion, outputs the maximum value. For example, with a 22 x 22 filter and stride 22 (common settings for max pooling), each max-pooling operation will select the maximum element from 44 (22 x 22) elements.

  • Average pooling: Instead of taking the maximum value from the part of the image covered by the filter, average pooling takes the arithmetic mean of the values. This method is less common than max pooling in practice for deep networks, but it is still useful in certain contexts. Average pooling distributes the contribution of all pixels in the window equally to the output, which can have a smoothing effect on the input features.

The following illustration represents how the max pooling operation works:

canvasAnimation-image
1 of 6

4. Flatten layer

The flatten layer in a neural network architecture is a simple yet crucial component, especially in the context of CNNs. The flatten layer transforms its multidimensional input into a single, long linear vector. This process is necessary when transitioning between convolutional/pooling layers and fully connected (dense) layers.

Purpose of the flatten layer

The flatten layer serves two main purposes:

  • Transition between convolutional and dense layers: In CNN architectures, it bridges the gap between multidimensional convolutional layers and one-dimensional dense layers by flattening the spatially organized feature maps into a one-dimensional form.

  • Preparing for classification or regression: It prepares the feature data for final tasks like classification or regression by reshaping them into a format suitable for processing by fully connected layers.

How the flatten layer works

Suppose that we have the output from a convolutional or a pooling layer, and this output is a two-dimensional grid (for instance, 99 x 99 in case of an image). When this output is passed through a flatten layer, the layer reshapes this 2D output into a 1D vector. It does so by arranging each row of the grid sequentially into a long linear vector. The resulting vector for the given example would have a size of 99=819 * 9 = 81 elements.

Representing the operation of the flatten layer
Representing the operation of the flatten layer

5. Fully connected layers

Fully connected layers, also known as dense layers, are a fundamental component in neural networks, especially in the architecture of CNNs. After the input data has been processed by a series of convolutional and pooling layers, fully connected layers are typically utilized toward the end of the network. Their primary role is to perform high-level reasoning and make predictions based on the features extracted and transformed by the preceding layers.

Structure of fully connected layers

A fully connected layer is so named because each of its neurons is connected to every neuron in the previous layer. This comprehensive connectivity ensures that the layer has a global understanding of the input features. In the context of a CNN, after convolutional and pooling layers have detected features and reduced the dimensionality of the input data, the role of the fully connected layers is to map these detected patterns to the desired output format, such as class scores in classification tasks.

How fully connected layers work

Fully connected layers receive input as vectors or matrices, typically flattened from convolutional or pooling layers.

  • Weight matrix: Neurons in fully connected layers possess weights and biases, constituting a weight matrix whose size is determined by input and neuron count.

  • Matrix multiplication and activation: Input undergoes matrix multiplication with weights, followed by bias addition. Nonlinear activation functions introduce complexity for pattern learning.

  • Output: Fully connected layer output varies by position. It may feed subsequent layers or produce final predictions like class scores or regression values.

Visual representation of a fully connected layer
Visual representation of a fully connected layer

Importance of fully connected layers

Fully connected layers play a crucial role in the functionality and adaptability of CNNs. They serve as the bridge between the extracted features and the final decision-making process. Let’s delve into their significance further:

  • Decision making: Fully connected layers in CNNs perform the final decision-making process, combining high-level features extracted by preceding layers.

  • Flexibility in output: Fully connected layer architecture can be customized for various output tasks, such as classification or regression, enabling tailored outputs.

  • Learning nonlinear combinations: Nonlinear activation functions in fully connected layers enable the network to learn intricate patterns, essential for capturing complex relationships in data.

6. Output layer

The output layer in a CNN is the final stage responsible for generating predictions based on learned features.

  • Prediction generation: Produces outputs corresponding to the network’s predictions tailored to the task (e.g., class probabilities for classification).

  • Activation function application: Applies appropriate activation functions (e.g., softmax for classification, linear for regression) to transform raw outputs.

  • Loss calculation: Computes the error between predictions and ground truth, guiding training via backpropagation and gradient descent.

The output layer’s design impacts the network’s performance and effectiveness in fulfilling its task.

Convolutions on RGB images

In a CNN, an image is convolved using different filters to extract its features. This can be summarized as follows:

  1. An input RGB image is represented as a matrix of size (widthwidth x heightheight x 33), for example, 6464 x 6464 x 33.

  2. The convolution operation is performed on this image with nn filters of size (filter_widthfilter\_width x filter_heightfilter\_height x 33), stride 11, and no padding. The formula to calculate the size of the output feature map is: (WidthFilter_WidthStride+1)×(HeightFilter_HeightStride+1)×No_of_Filters\scriptsize\left(\frac{{\text{{Width}} - \text{{Filter\_Width}}}}{\text{{Stride}}} + 1\right) \times \left(\frac{{\text{{Height}} - \text{{Filter\_Height}}}}{\text{{Stride}}} + 1\right) \times \text{{No\_of\_Filters}}\normalsize. For example, with 88 filters of size 55 x 55, the output size is(6451)+1=37\left( \frac{{64 - 5}}{{1}} \right) + 1 = 37, so output size=60×60×8\text{output size} = 60 \times 60 \times 8.

  3. This output (feature maps) is passed to a max pooling layer with a 22 x 22 filter size and a stride of 22. This process reduces feature map dimensions, aiding computational efficiency and mitigating overfitting.

  4. After passing through the convolution layer again followed by the pooling layer again, the output size becomes 1313 x 1313 x 1616.

  5. Next, a flattening operation is done by unrolling the output into a 1D vector. The size of this vector is calculated as: Width×Height×DepthWidth \times Height \times Depth. For the given example, the flattened vector would have a size of 131316=270413*13*16 = 2704 units.

  6. Finally, this flattened vector is fed to a classifier (e.g., a fully connected layer in a neural network) to make predictions. The output of the classifier is calculated using a softmax function, which gives the probability distribution over the class labels. The class with the highest probability is selected as the final output. The softmax formula is: softmax(xi)=e(xi)je(xj)\text{{softmax}}(x_i) = \frac{{e^{(x_i)}}}{{\sum_j e^{(x_j)}}}, where xx is the input vector to the softmax function, and ii and jj are the indexes of the elements in this vector.

The inner workings of a CNN architecture demonstrated in the context of RGB images
The inner workings of a CNN architecture demonstrated in the context of RGB images

CNN for handwritten digit recognition

We are implementing a CNN using TensorFlow/Keras to classify handwritten digits from the MNIST dataset. Below is the implementation code:

# Import necessary libraries
import tensorflow as tf
from tensorflow.keras import layers, models
from tensorflow.keras.datasets import mnist
from tensorflow.keras.utils import to_categorical
# Load and preprocess the MNIST dataset
(train_images, train_labels), (test_images, test_labels) = mnist.load_data()
# Normalize pixel values to be between 0 and 1
train_images, test_images = train_images / 255.0, test_images / 255.0
# Expand dimensions to add a channel dimension (for CNN)
train_images = train_images.reshape((60000, 28, 28, 1))
test_images = test_images.reshape((10000, 28, 28, 1))
# One-hot encode the labels
train_labels = to_categorical(train_labels)
test_labels = to_categorical(test_labels)
# Implementation of sequential model
# Step 1: Create a sequential model
model = models.Sequential()
# Step 2: Add a convolutional layer with 32 filters, a 3x3 kernel, and ReLU activation function
model.add(layers.Conv2D(32, (3, 3), activation='relu', input_shape=(28, 28, 1)))
# Step 3: Add a max pooling layer with 2x2 pool size
model.add(layers.MaxPooling2D((2, 2)))
# Step 4: Add another convolutional layer with 64 filters and a 3x3 kernel
model.add(layers.Conv2D(64, (3, 3), activation='relu'))
# Step 5: Add another max pooling layer
model.add(layers.MaxPooling2D((2, 2)))
# Step 6: Flatten the output to a 1D array
model.add(layers.Flatten())
# Step 7: Add a dense layer with 64 units and ReLU activation
model.add(layers.Dense(64, activation='relu'))
# Step 8: Add the output layer with 10 units (for 10 classes) and softmax activation
model.add(layers.Dense(10, activation='softmax'))
# Step 9: Compile the model
model.compile(optimizer='adam',
loss='categorical_crossentropy',
metrics=['accuracy'])
# Step 10: Train the model
model.fit(train_images, train_labels, epochs=5, batch_size=64, validation_data=(test_images, test_labels))
# Step 11: Evaluate the model on the test set
test_loss, test_acc = model.evaluate(test_images, test_labels)
print(f'Test accuracy: {test_acc}')

The following is an explanation of the above implementation:

  • Lines 2­–5: Necessary libraries, including TensorFlow, are imported for building the CNN model, along with components like layers and datasets from Keras.

  • Lines 8–19: The MNIST dataset is loaded and preprocessed. Images are normalized to have pixel values between 00 and 11. The dimensions are expanded to include a channel dimension for CNN compatibility, and labels are one-hot encoded.

  • Lines 23–44: Sequential model implementation steps:

    • Step 1: Creation of a sequential model.

    • Step 2: Addition of a convolutional layer with 3232 filters, a 33 x 33 kernel, and ReLU activation.

    • Step 3: Addition of a max pooling layer with a 22 x 22 pool size.

    • Step 4: Addition of another convolutional layer with 6464 filters and a 33 x 33 kernel.

    • Step 5: Addition of another max pooling layer.

    • Step 6: Flattening the output to a 1D array.

    • Step 7: Addition of a dense layer with 6464 units and ReLU activation.

    • Step 8: Addition of the output layer with 1010 units (for 1010 classes) and softmax activation.

  • Lines 47–49: Compiling the model with the Adam optimizer, categorical cross-entropy loss function, and accuracy metric.

  • Lines 52: Training the model for 55 epochs with a batch size of 6464, using the training data and validating on the test data.

  • Lines 55–56: Evaluating the trained model on the test set, printing the test accuracy.

CNN applications

From classifying images to enabling autonomous vehicles, CNNs find diverse applications across various domains:

Application

Description

Image Classification

CNNs excel at hierarchical representation learning, aiding in precise object identification within images.

Object Detection

Employing techniques like region proposal networks, CNNs accurately locate and classify objects in images.

Facial Recognition

CNNs robustly capture facial features, facilitating accurate identification of individuals across images or video frames.

Autonomous Vehicles

Integral to autonomous vehicles, CNNs interpret sensor input to detect objects, pedestrians, and road signs, ensuring safe navigation.

Medical Imaging

In medical imaging, CNNs assist in tumor detection, disease diagnosis, and organ segmentation, enhancing diagnostic accuracy.

Natural Language Processing

Beyond image processing, CNNs support NLP tasks like text classification, sentiment analysis, and language translation, effectively capturing contextual information.

Video Analysis

CNNs process sequential data for action recognition, object tracking, and surveillance, extracting temporal dynamics from video frames.

Recommendation Systems

Utilized in recommendation systems, CNNs analyze user preferences from diverse data types, offering personalized suggestions.

Robotics

In robotics, CNNs aid in object manipulation, scene understanding, and navigation, empowering robots to make informed decisions based on visual inputs.

Classical pretrained models

The ImageNet Large Scale Visual Recognition Challenge (ILSVRC) is an annual competition where research teams assess their algorithms on the task of detecting objects and categories in images from the large-scale ImageNet database. The competition has been influential in fostering the development of deep learning and CNNs for computer vision tasks.

Here’s an overview of the common models that have emerged as significant in this space:

Model

Developer

Year

Key Features

Advantages

Limitations

LeNet-5

Yann LeCun

1998

7-level convolutional network

Simple architecture for digit classification

Limited depth

AlexNet

Alex Krizhevsky, Ilya Sutskever, Geoffrey Hinton

2012

8 layers, won 2012 ILSVRC

Pioneered the use of deep CNNs for image recognition

Requires significant computational resources

VGG16

Visual Graphics Group at Oxford

2014

16 convolutional layers

Uniform architecture, easy to understand and implement

High computational cost

Inceptionv3 (GoogLeNet)

Google

2015

Complex architecture, efficient performance

Achieves high accuracy with efficient architecture

Complexity may lead to overfitting

ResNet50

Microsoft

2015

Residual connections to solve vanishing gradient problem

Effectively trains deep networks with 50 layers

Requires careful tuning to avoid degradation of performance

EfficientNet

Google

2019

Optimized efficiency with compound coefficient scaling

Provides better performance with fewer parameters

Training may be slower due to increased complexity

MobileNet

Google

2017

Designed for mobile and embedded devices

Lightweight and efficient for low-power devices

May sacrifice some accuracy for efficiency

YOLO (You Only Look Once)

Joseph Redmon

2016

Real-time object detection with single network application

Efficient for real-time applications

Lower accuracy compared to slower, multistage detectors

These models have played a significant role in advancing the field of computer vision, each with its own strengths and weaknesses.

Conclusion

CNNs are a cornerstone in the field of artificial intelligence, offering unparalleled proficiency in tasks ranging from image recognition to complex pattern detection across varied domains. Inspired by the human visual system, their hierarchical structure enables effective feature extraction and analysis, leading to significant advancements in computer vision and beyond. The practical success of applications such as autonomous driving, medical diagnostics, and facial recognition underlines their transformative potential. As technology evolves, CNNs continue to push the boundaries of what machines can understand and achieve from visual data.

Next steps

If you want to expand your knowledge and learn more about CNNs, the following courses are an excellent starting point for you:

A Beginner's Guide to Deep Learning

Cover
A Beginner's Guide to Deep Learning

This beginner level and highly comprehensive course is intended for learners who are familiar with Python programming. You will become familiar with the fundamental concepts and terminologies used in deep learning. In addition, this course will help you understand the importance of deep learning techniques. You will examine simple models like perceptron before learning more complex yet powerful deep learning models. The course will provide hands-on practical knowledge of how to code simple and complex deep learning models in NumPy, a powerful Python library and Keras, a cutting-edge library for deep learning in Python. You can test your knowledge with the quizzes that are provided at the end of every lesson and coding challenges that will help you gain a higher understanding. By the end of the course, you should have a general understanding of the basics in deep learning and you will be equipped with the right tools to learn more advanced concepts.

20hrs
Beginner
13 Challenges
18 Quizzes

Introduction to Deep Learning & Neural Networks

Cover
Introduction to Deep Learning & Neural Networks

This course is an accumulation of well-grounded knowledge and experience in deep learning. It provides you with the basic concepts you need in order to start working with and training various machine learning models. You will cover both basic and intermediate concepts including but not limited to: convolutional neural networks, recurrent neural networks, generative adversarial networks as well as transformers. After completing this course, you will have a comprehensive understanding of the fundamental architectural components of deep learning. Whether you’re a data and computer scientist, computer and big data engineer, solution architect, or software engineer, you will benefit from this course.

4hrs 30mins
Intermediate
11 Challenges
8 Quizzes

Natural Language Processing with TensorFlow

Cover
Natural Language Processing with TensorFlow

Deep learning has revolutionized natural language processing (NLP) and NLP problems that require a large amount of work in terms of designing new features. Tuning models can now be efficiently solved using NLP. In this course, you will learn the fundamentals of TensorFlow and Keras, which is a Python-based interface for TensorFlow. Next, you will build embeddings and other vector representations, including the skip-gram model, continuous bag-of-words, and Global Vector representations. You will then learn about convolutional neural networks, recurrent neural networks, and long short-term memory networks. You’ll also learn to solve NLP tasks like named entity recognition, text generation, and machine translation using them. Lastly, you will learn transformer-based architectures and perform question answering (using BERT) and caption generation. By the end of this course, you will have a solid foundation in NLP and the skills to build TensorFlow-based solutions for a wide range of NLP problems.

15hrs
Intermediate
33 Playgrounds
10 Quizzes

Frequently Asked Questions

What is a Convolutional Neural Network?

A Convolutional Neural Network (CNN) is a specialized type of neural network designed for processing and analyzing visual data like images. It uses a hierarchical structure of layers to automatically learn and extract features from input images, making it effective for tasks such as image classification and object detection.

What is CNN with an example?

What are the concepts of CNN?

What is the main advantage of CNN?

What are CNNs used for?

How many layers are in CNN?

Where is CNN used in real life?

Where is CNN mostly used?

What is the principle of CNN?

What are the CNN algorithms?

What are convolutions in CNN?

What is the full definition of CNN?

Is CNN a deep learning algorithm?