Convolutional neural networks (CNNs) transformed computer vision by allowing machines to evaluate visual patterns. An essential element in CNNs is padding, which refers to adding more pixels/values around the input images (data) before applying operations. This Answer delves into the padding, its significance, and types in CNNs.
Padding in CNN has two essential advantages that are described below:
Preserving spatial information: Padding doesn’t allow for reducing spatial dimensions as the input goes through layers. By keeping this initial spatial size, padding retains essential information at the edges.
Mitigating border effects: Operations performed on the edges (without padding) may lead to misalignment. This leads to unwanted border effects and less focus on edges. Padding addresses this by allowing proper alignment of the filter by introducing extra pixels.
Primarily, there are four types of paddings, as discussed below:
Valid padding (or no padding): This type involves no additional pixels, which reduces spatial dimensions. While it may be efficient computationally, it may reduce information at the edges.
Same padding: It involves adding zeros around input data, which means the output spatial dimensions will match the input. This preserves spatial information at the edges.
Reflective padding: It involves mirroring values at input edges, which makes a reflection. Addresses border effects by allowing the correct alignment of convolutional filters.
Replicate padding: It involves duplicating values at input edges, which reduces border effects by extending the input with replicated border values.
The illustration below displays the types of padding:
Let’s implement the Zero and Valid padding for demonstration purposes to see how they work.
Zero padding, also known as Same padding, adds zeros to the end of an input image. The illustration below depicts how padding is applied to an image’s pixels so that each input image has the exact size:
The following code demonstrates how zero padding is applied through the TensorFlow library:
import tensorflow as tfmodel = tf.keras.models.Sequential([# Convolutional layertf.keras.layers.Conv2D(32, (3, 3), padding='same', activation='relu', input_shape=(28, 28, 1)),# Pooling layertf.keras.layers.MaxPooling2D((2, 2)),# Flatten the output to feed into a dense layertf.keras.layers.Flatten(),# Dense layertf.keras.layers.Dense(128, activation='relu'),# Output layertf.keras.layers.Dense(7, activation='softmax') # Assuming 7 classes for classification])# Compiling the modelmodel.compile(optimizer='Adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
The code above demonstrated a CNN classification using the ConV2D layer, which uses the Same
padding, meaning that the output layer will have the same spatial dimensions as the input.
Valid padding means that no padding is added to the input. The following code demonstrates how to implement valid padding through the TensorFlow library:
tf.keras.layers.Conv2D(32, (3, 3), padding='valid', activation='relu', input_shape=(28, 28, 1))
Till now we have seen how padding works in images (imagery data). Let's look at how padding is performed in text (text data).
Processing sentences can be difficult as they can come in varying lengths. Hence, padding can be applied to the start or end of the text so all input sequences remain of the same length. The illustration below depicts the same concept:
Note: The padding in text is applied after tokenization and encoding in which sentences are converted to smaller parts and numerical values, respectively.
The pad_sequences
function from TensorFlow is used for this purpose:
from tensorflow.keras.preprocessing.sequence import pad_sequences# A list of tokenized sentencessequences = [[1, 2, 3, 4],[1, 2],[1, 2, 3, 4, 5, 6]]# Pad sequences to the length of 10padded_sequences = pad_sequences(sequences, padding='post', maxlen=10, value=0)print(padded_sequences)
Solve this following quiz to evaluate your understanding of padding:
What is the primary role of padding in CNNs?
Reducing spatial dimensions
Enhancing computational efficiency
Preserving spatial information
Mitigating filter misalignment
Free Resources