Neural Network Construction
Learn about TensorFlow’s sequential model building, from initializing layers to model fitting.
TensorFlow provides a simple-to-implement
- Sequential
- Functional
- Model subclassing
The ease of their use is in the same order. Most modeling requirements are covered by the sequential and functional approaches.
The sequential approach
Sequential is the simplest approach. In this approach, models that have a linear stack of layers and the layers communicate sequentially are constructed. Models in which layers communicate non-sequentially (for example, residual networks) cannot be modeled with a sequential approach. Functional or model subclassing is used in such cases.
Multi-layer Perceptrons (MLPs) are sequential models. Therefore, a sequential model is initialized, as shown below.
model = Sequential ()
The initialization here is creating a Sequential
object. Sequential
inherits the Model
class in TensorFlow, and that way, inherits all the training and inference features.
Input layer
The model starts with an input layer. No computation is performed at this layer. Still, this plays an important role.
An input layer can be imagined as a gate to the model. The gate has a defined shape. This shape should be consistent with the input sample. This layer has two functions:
- To not allow a sample to get through if its shape is not consistent
- To communicate the shape of the inputted batch to the next layer
model.add(Input(shape=(N_FEATURES , )))
The input layer is added to the model as shown above. It takes an argument shape, which is a tuple of the shape of the input. The tuple contains the length of each axis of the input sample, and the last element is empty.
Here, the input has only one axis for the features with a length N_FEATURES
defined in the previous lesson. In the case of multi-axes inputs, such as images and videos, the tuple will have more elements.
The last (empty) element corresponds to the batch size. The batch size is defined during the model fit and is automatically taken by the model. The empty element in the tuple can be seen as a placeholder.
Explicitly defining the input layer is optional. In fact, it is common to define the input shape in the first computation layer. For example:
Dense(..., input_shape=(N_FEATURES, ))
The above line represents a dense layer in a neural network. The ellipsis (...)
typically includes the number of neurons and the activation function. input_shape=(N_FEATURES, )
specifies the shape of the input data, where N_FEATURES
is the number of features in each input.
Dense layer
A dense layer is one of the primary layers in deep learning. It’s used in MLPs and most other deep learning architectures.
Its importance can be attributed to its simplicity. A linearly activated dense layer is simply an affine transformation of the inputs.
Moreover, as opposed to most other layers, a (non)linear dense layer provides a simple structure to find a relationship between the features and response in the same space.
An MLP is a stack of dense layers. That is, from hidden to output all are dense layers.
The number of hidden layers is a model configuration. As a general principle, it’s recommended to begin with two hidden layers as a baseline.
They’re added to the model as shown below.
model.add(Dense(32, activation='relu', name='hidden_layer_1'))model.add(Dense(16, activation='relu', name='hidden_layer_2'))
The size of a layer is the first argument. The number of nodes (denoted as units
in TensorFlow) in the layer is the same as its size.
The size is a configuration property. It should be set around half of the number of input features. As a convention the size should be taken from a geometric series of 2: a number in .
The input sample has 69 features, therefore, the first dense layer is made of size 32
. This also means the input to the second layer has 32
features, and therefore, its size is set as 16
.
Following these conventions is optional but helps in streamlined model construction. These conventions are made keeping in account the insensitivity of deep learning models towards minor configuration changes. Deep learning models are generally insensitive to minor changes in a layer size. Therefore, it’s easier to follow a general principle for configuring layer sizes.
Activation is the next argument. This is an important argument because the model is generally sensitive to any ill selection of activation.
Note: Appropriate choice of activation is essential because models are sensitive to them.
relu
activation is a good default choice for hidden layers.
The name
argument, in the end, is optional. It’s added for better readability in the model summary.
Output layer
The output layer in most deep learning networks is a dense layer. This is due to the dense layer’s affine transformation property, which is usually required at the last layer. In an MLP, it’s a dense layer by design.
The output layer should be consistent with the response’s size just like the input layer must be consistent with the input sample’s size.
In a classification problem, the size of the output layer is equal to the number of classes/responses. Therefore, the output dense layer has a unit size in a binary classifier (size=1
) as shown below.
model.add(Dense(1, activation='sigmoid', name='output_layer'))
Also, the activation on this layer is dictated by the problem. For regression, if the response is in ...