Sentence Classification with CNNs
Learn to implement the CNN-based model for sentence classification.
We'll cover the following...
We’re now ready to implement the model in TensorFlow 2. As a prerequisite, let’s import several necessary modules from TensorFlow:
import tensorflow.keras.backend as Kimport tensorflow.keras.layers as layersimport tensorflow.keras.regularizers as regularizersfrom tensorflow.keras.models import Model
Clear the running session to make sure previous runs are not interfering with the current run:
K.clear_session()
Before we start, we’ll be using the functional API from Keras. This is because the model we’ll be building here can’t be built with the sequential API due to the intricate pathways present in the model. Let’s start off by creating an input layer:
#Input layer takes word IDs as inputsword_id_inputs = layers.Input(shape=(max_seq_length,), dtype='int32')
The input layer simply takes a batch of max_seq_length
word IDs—that is, a batch of sequences, where each sequence is padded/truncated to a maximum length. We specify the dtype
as int32
since they are word IDs. Next, we define an embedding layer from which we’ll look up embeddings corresponding to the word IDs coming through the word_id_inputs
layer:
# Get the embeddings of the inputs / out [batch_size, sent_length, output_dim]embedding_out = layers.Embedding(input_dim=n_vocab, output_dim=64)(word_id_inputs)
This is a randomly initialized embedding layer. It contains a large matrix of size [n_vocab, 64]
, where each row represents the word vector of the word indexed by that row number. The embeddings will be jointly learned with the model while the model is trained on the supervised task. For the next part, we’ll define three different one-dimensional convolution layers with three different kernel (filter) sizes of 3, 4, and 5, having 100 feature maps each:
# For all layers: in [batch_size, sent_length, emb_size] / out [batch_size, sent_length, 100]conv1_1 = layers.Conv1D(100, kernel_size=3, strides=1, padding='same', activation='relu')(embedding_out)conv1_2 = layers.Conv1D(100, kernel_size=4, strides=1, padding='same', activation='relu')(embedding_out)conv1_3 = layers.Conv1D(100, kernel_size=5, strides=1, padding='same', activation='relu')(embedding_out)