From artificial intelligence to deep learning

Artificial intelligence, one of the research areas that attract excellent attention today, brings some concepts that we often hear. Knowing and distinguishing the meaning of these terms and subcategories would help to understand which category we will work within this course.

  • Artificial intelligence: The theory and development of computer systems that can perform tasks usually requiring human intelligence.

  • Machine learning: This is a subcategory of artificial intelligence that allows computers to learn without being explicitly programmed.

  • Deep learning: This is a subcategory of machine learning. It involves algorithms with brain-like logical structures called artificial neural networks.

We can produce various projects belonging to the different subcategories using these deep neural networks.

Suppose the deep neural network is created to process images or videos. In that case, it belongs to the computer vision category, whereas text or audio processing are other subcategories that can use deep neural networks.

Press + to interact
Artificial intelligence
Artificial intelligence
1 of 4

General structure of a fully connected neural network

A neural network is a computing system inspired by the biological neural networks that constitute the human brain. A neural network is based on a collection of connected nodes called artificial neurons, which loosely model the neurons in a physical brain. An artificial neuron receives a signal then, processes it, and can signal other neurons connected to it.

Basics of a neural network

The basics of a neural network are as follows:

  • Neuron: A node carrying the signal.

  • Weight: A coefficient that determines the effect of the signal.

  • Activation function: A particular function used to calculate output signal from input signals.

  • Bias: A coefficient to be added to the sum of input signals before going to the activation function. Each output neuron has its own bias.

Visualization of a neural network

Let’s visualize an example of a basic neural network that has two layers.

Press + to interact
Basics of a fully connected neural network
Basics of a fully connected neural network

We have a simple neural network structure with three input nodes having special weights for each, one bias for the output node, and an activation function to automatically calculate the price of a house. Whereas the first node represents the size of the house, the second node represents the score out of 10 for the location, and the third represents the age. Let’s consider the house properties and the trained network parameters as follows:

  • n1=160n_1 = 160
  • n2=8n_2 = 8
  • n3=12n_3 = 12
  • w1=3w_1 = 3
  • w2=2w_2 = 2
  • w3=4w_3 = -4
  • bias=15\text{bias} = 15
  • f=x2f = x^2

The output node would be calculated as follows:

So the trained model tells us the optimal sale price of this house should be 463k dollars.

Whether this prediction is accurate, we have learned a simple neural network structure and its components. Additionally, if each input neuron in a layer is connected to each output neuron in the next layer, it’s called a fully connected layer. To summarize, the models whose structure is established using neurons in this way—with fully connected layers—are called fully connected neural networks. A simple fully connected neural network with four layers is shown below:

Press + to interact
Layers in a fully connected neural network
Layers in a fully connected neural network

Deep neural networks refer to complex neural networks with many more hidden layers shown in this simple example. They can have both fully connected or convolutional layers.

Different training methods

Apart from the structure of the model, there is another step that divides the neural network models into two different subcategories, i.e., the training method. In the above example, we considered a trained model giving us a prediction about a house price where the model uses its weights and bias to make this prediction. Training is the step of preparing these weights and biases by the model itself so that the model can use these learned parameters for inference, which refers to the time to use the prepared model in real life.

Supervised and unsupervised methods are the two main methods to train a model. If we train a neural network by showing the correct answer, that is a supervised neural network, and the learning type is supervised learning. If we train a neural network by not showing the correct answer, that is an unsupervised neural network, and the learning type is unsupervised learning.

Although there are some other and more complex training methods, we will apply supervised learning using images and their labels as ground truth answers in this course. Therefore, it is necessary to understand the main logic behind supervised learning, as explained below.

Supervised learning fundamentals

If we give the true answer to our question and let the model calculate the difference between that true answer and the prediction made, then use that difference to update its weights and bias, this is supervised learning, and we call that model a supervised neural network.

There are various cost/loss functions to calculate the difference between the prediction and the ground truth (true answer). We feed this difference into a gradient descent algorithm, which provides us with the value to subtract from our weights or bias to update them.

Cost function

The cost function is used to calculate the difference between the prediction and the ground truth (true answer). Among various types, the cost function can be chosen according to the model’s prediction type. For example,

  • For a regression problem, mean squared error is commonly used.
  • For a classification problem, cross-entropy loss and taking the softmax activation function’s output would be preferable.

Gradient descent

The next step is to decide how to use this error-loss calculated by the cost function to update our weights and biases. A standard answer for this with neural networks is to use gradient descent. Gradient descent is the calculation of the loss derivative for bias and weights. Finding optimal weights and biases during our neural network training is a minimization problem, where we try to find a local minimum for our error.

Press + to interact
Gradient descent reaching the minimum cost
Gradient descent reaching the minimum cost

Backpropagation

Feeding the model with input data and moving forward from input to output nodes is also called forward propagation. On the contrary, after calculating the gradient descents for each weight and bias in our layers, we have to go backward to update them. This process is called backpropagation.

The update operation involves subtracting the gradient descent multiplied by a learning rate from the weight and assigning it as the new value of this specific weight.

The learning rate is a coefficient that determines the size of the step we try to take while moving through the minimal cost.

Therefore, dJ/dWijdJ/dW_{ij} refers to the cost gradient for the ii layer’s jj node’s weight (for example, w12w_{12} means the weight of the second node in the first layer), and α the learning rate. This weight is updated as follows:

It’s simple, right? Nothing more than subtracting the gradient of loss for this specific weight multiplied by the learning rate. Similarly, the bias of the same node would be updated as follows:

After updating all the weights and biases in our network in the same way, we are ready to pass our second forward pass iteration with our new weights and biases.

Imagine that you have just started to train your model, and your cost is far from reaching the minimal cost. You could choose a considerable learning rate and move faster to make your steps bigger in the above graph. On the other hand, the closer you get to your minimal cost, the smaller the learning rate would be better at avoiding missing the minimum.

Press + to interact
The effect of  different learning rates
The effect of different learning rates

Training cycles

We covered the main steps and methods applied during training. It’s also essential to know some terms used to express when and how many times we apply these steps during the training.

One epoch means one fully completed training cycle. During one epoch, all the input data we have should be uploaded once. We can upload our data one by one or with batches where the iteration and batch terms come up. One iteration is the process of uploading one batch to the network, calculating the mean loss of given data in this batch, and updating the weights and biases with backpropagation. When all the epochs are completed, the training is done.

So let’s say we train a classification model using images and have 1000 different images in our dataset. We decide to feed the network with batch size of 5. It means we will give five images to the network one by one, calculate the loss for each, and take the average of this loss to calculate gradient descent to apply finally backpropagation. In that case, one iteration is completed when these five images are used, and then we pass to the next iteration. Since we have to give all the data to the network to complete one epoch and give them five by five, we have 200 iterations in 1 epoch.

“How many epochs should we train our model?” or “How much should be our batch size?” are model and dataset-specific questions, and we have to fine-tune our model to find optimal answers. We will see some examples of how to fine-tune our trained model.

Press + to interact
Training cycle—epochs, iterations and batches
Training cycle—epochs, iterations and batches

Take-Away Vocabulary

Input layer

It consists of input nodes, the data we want to process.

Hidden layer

It is the middle one connecting the signals from the input layer to the output layer. It can consist of from one to a considerable number of layers.

Output layer

It is the last one that holds our final result.

Supervised learning

Training method by showing the ground truth to the model along with data.

True answer (ground truth)

The expected prediction—label from the model for given data.


Fully connected neural network

It is where all the nodes are connected to the following layer’s nodes.

Cost function (loss function)

The function to calculate the error between true answer and prediction.

Gradient descent

Step for calculating cost derivative via weights and bias.


Backpropagation

The stage of updating the weights using gradient descent.

Epoch

One cycle of training.

Iteration

One step inside of epoch feeding the network as much as images with batch size.

Batch size

The number of images to send through the network one by one to calculate the mean loss for updating weights and biases.