Neural Networks

Learn about neural networks, a popular model for both regression and classification.

Why neural networks?

Neural networks have become very popular in recent years. This is due to several factors:

  • Universal approximation theorem: For any function, there exists a neural network that approximates it to arbitrary precision.
  • Big Data: Neural networks are data-hungry for estimating the parameters more reliably. The choice of neural networks is justified more recently with the availability of Big DataLarger, complex, and exponentially increasing amounts of information from new sources.
  • Software and hardware: A more stable support of software packages (TensorFlow, PyTorch, MXNet), as well as specialized hardware for efficient and customized computations, makes the choice of neural networks more practical.

A neural network is a combination of several units, each of which is similar to a logistic regression unit. The difference, however, is that these units can have any non-linear function on top of xTw\bold{x}^T\bold{w} and aren’t restricted to the sigmoid. These units are known as neurons or perceptrons, and the non-linear functions are known as activation functions.

A typical logistic regression unit with 2 inputs

What’s a neural network?

In a neural network, we combine different neurons, each having a possibly different set of parameters, w\bold{w}and a possibly different activation function. The figure below is a typical example of a neural network. The first layer consists of inputs to the network and is known as the input layer. The input layer has labels, xix_i. . Unlike other layers, the input layer doesn’t have neurons. So, no computation happens in the input layer other than copying the input to the layer. The last layer consists of the outputs and is known as the output layer. The output layer has labels, y^j\hat y_j. . All layers between the input layer and the output layer are known as hidden layers. The hidden layers have labels of the form akla_k^l, where kk is the neuron index in layer ll. Both the hidden layer and the output layer are computational, that is, they consist of neurons that are computational units.

A typical neural network with an input layer having five features, an output layer having three outputs, and four hidden layers

Forward pass

In a neural network, the input to every neuron is a vector that consists of the outputs of all the neurons in the previous layer. Each neuron outputs a real number. If layer ll has nln_l ...