Neuron

A neuron, in the context of neural networks and artificial intelligence, is a fundamental computational unit that mimics the behavior of biological neurons found in the human brain. Neurons are the building blocks of artificial neural networks, which are used for various machine learning tasks, including image recognition, natural language processing, and more.

Components of a neuron

Let’s discuss the key components and functions of an artificial neuron:

Input: Neurons receive input signals from other neurons.
Weights: Each input is associated with a weight that determines its influence on the neuron’s output. These weights are learnable parameters that are adjusted during the training process to optimize the neuron’s performance.
Summation: The weighted input signals are summed together, often with an additional bias term, to produce a single value. This weighted sum represents the net input to the neuron.
Activation function: The net input is then passed through an activation function. The activation function introduces nonlinearity into the neuron’s computation.
Output: The result of the activation function is the output of the neuron, which can be passed to other neurons in subsequent layers of the neural network.

Here’s the illustration for the components of a neuron:

Neural network

In a neural network, we combine distinct neurons, each with its own set of parameters $\bold{w}$ and an activation function. The figure below exemplifies a typical neural network, comprising an input layer as the first layer, consisting of input labels $x_i$ . Unlike other layers, the input layer doesn’t contain neurons and simply copies the input to the subsequent layers. The last layer represents the output layer, with output labels $\hat{y}_j$ . All layers situated between the input and output layers are referred to as hidden layers. The hidden layers use labels of the form $a_k^l$ , where $k$ denotes the neuron index in layer $l$ . Both the hidden layers and the output layer are computational, meaning they’re comprised of neurons serving as computational units.

Forward pass

In a neural network, each neuron takes as input a vector consisting of the outputs from all neurons in the previous layer. Consequently, every neuron produces a real number output. For a layer $l$ containing $n_l$ neurons, the output vector of this layer comprises $n_l$ components, which act as the input for all neurons in the subsequent layer $l+1$ . This arrangement ensures that each neuron in layer $l+1$ possesses $n_l$ parameters, maintaining uniformity across all neurons in the same layer. Although each neuron has its own set of parameters, they all share the same number of parameters.

Let $\bold{w}_k^{l}$ represent the parameter vector of the $k^{th}$ neuron in layer $l$ , then the parameter matrix $\bold{W}^l$ can be defined as $\begin{bmatrix}\bold{w}_1^{l} &\bold{w}2^{l}&\dots&\bold{w}^{l}{n_l}\end{bmatrix}^T$ . If all neurons in layer $l$ employ the same activation function $g^l$ , and $\bold{a}^{l-1}$ denotes the output vector of layer $l-1$ , the relationship between them can be expressed as:

\bold{a}^l=g^l(\bold{W}^l\bold{a}^{l-1})

The input vector $\bold{x}$ is typically denoted as $\bold{a}^0$ , and the output vector of the last layer is denoted as $\bold{\hat y}$ ...