Generative AI with Python and TensorFlow 2/

...

LSTM Variants and Convolutions for Text

Learn about two popular variants of the single-layer LSTM networks—stacked and bidirectional LSTMs.

We'll cover the following...

Stacked LSTMs
Bidirectional LSTMs
Convolutions and text

RNNs are extremely useful when it comes to handling sequential datasets. A simple model can effectively learn to generate text based on what it learned from the training dataset.

Over the years, there have been a number of enhancements in the way we model and use RNNs. In this section, we’ll discuss two widely used variants of the single-layer LSTM network we discussed earlier: stacked and bidirectional LSTMs.

Stacked LSTMs

We are well aware of how the depth of a neural network helps it learn complex and abstract concepts when it comes to computer vision tasks. Along the same lines, a stacked LSTM architecture, which has multiple layers of LSTMs stacked one after the other, has been shown to give considerable improvements. Stacked LSTMs were first presented by Graves et al. in their work “Speech Recognition with Deep Recurrent Neural NetworksGraves, Alex, Abdel-rahman Mohamed, and Geoffrey Hinton. 2013. “Speech Recognition with Deep Recurrent Neural Networks.” ArXiv.org. 2013. https://arxiv.org/abs/1303.5778..” They highlight the fact that depth—multiple layers of RNNs—has a greater impact on performance compared to the number of units per layer.

Press + to interact

Though there isn’t any theoretical proof to explain this performance gain, empirical results help us understand the impact. These enhancements can be attributed to the model’s capacity to learn complex features and even abstract representations of inputs. Since there is a time component associated with LSTMs and RNNs in general, deeper networks learn the ability to operate at different time scales as wellPascanu, Razvan, Caglar Gulcehre, Kyunghyun Cho, and Yoshua Bengio. 2014. “How to Construct Deep Recurrent Neural Networks.” ArXiv:1312.6026 [Cs, Stat], April. https://arxiv.org/abs/1312.6026..

As we are using the high-level Keras API, we can easily extend the architecture we used in the previous section to add additional LSTM layers. The following snippet modifies the build_model function to do just that:

Press + to interact

def build_model(vocab_size, embedding_dim, rnn_units, batch_size,is_bidirectional=False):
    """
    Utility to create a model object.
    Parameters:
        vocab_size: number of unique characters
        embedding_dim: size of embedding vector. This typically in
                       powers of 2, i.e. 64, 128, 256 and so on
        rnn_units: number of LSTM units to be used
        batch_size: batch size for training the model
    Returns:
        tf.keras model object
    """
    model = tf.keras.Sequential()
    model.add(tf.keras.layers.Embedding(vocab_size, embedding_dim, 
              batch_input_shape=[batch_size, None]))
    
    if is_bidirectional:
      model.add(tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(rnn_units, 
                return_sequences=True, stateful=True, recurrent_initializer='glorot_uniform')))
    else:
      model.add(tf.keras.layers.LSTM(rnn_units, return_sequences=True,
                        stateful=True, recurrent_initializer='glorot_uniform'))
      
      model.add(tf.keras.layers.LSTM(rnn_units,return_sequences=True,
                        stateful=True, recurrent_initializer='glorot_uniform'))
      
      model.add(tf.keras.layers.Dense(vocab_size))
    
    return model

Line 14: We initialize a sequential model using tf.keras.Sequential().
Lines 15–16: We add an embedding layer to the model using tf.keras.layers.Embedding(). It takes the vocab_size, embedding_dim, and batch_input_shape as parameters. The batch_input_shape is set to [batch_size, None] to allow for variable sequence lengths in the input.
Lines 18–28: We add LSTM layers to the model. If is_bidirectional is True, a bidirectional LSTM layer is added using tf.keras.layers.Bidirectional(). If it is False, two LSTM layers are added sequentially. Finally, a dense layer with size vocab_size is added.

The dataset, training loop, and even the inference utilities remain as-is. For brevity, we have skipped presenting those code snippets again. We will discuss the bidirectional argument that we introduce here shortly.

# Greedy decoding
print('Greedy Decoding')
print(generate_text(
    model, context_string=u"It was in July, 1805", num_generate=100, mode="greedy"))
print()

print('Sampled @ 0.3')
# Sampled decoding with different temperature settings
print(generate_text(
    model, context_string=u"It was in July, 1805", num_generate=100, mode="sampling", temperature=0.3))
print()

print('Sampled @ 0.9')
print(generate_text(
    model, context_string=u"It was in July, 1805", num_generate=100, mode="sampling", temperature=0.9))

Generation of text using deeper LSTM

Now, let’s see how the results look for this deeper LSTM-based language model. The code output demonstrates the results from this model.

We can clearly see how the generated text is picking up the writing style of the book, capitalization, punctuation, and other aspects better than the outputs shown in the figure above. This highlights some of the advantages we discussed regarding deeper RNN architectures.

Bidirectional LSTMs

The second variant that’s very widely used nowadays is the bidirectional LSTM. We have already discussed how LSTMs, and RNNs in general, condition their outputs by making use of previous timesteps. When it comes to text or any sequence data, this means that the LSTM is able to make use of past context to predict future timesteps. While this is a very useful property, this is not the best we can achieve. Let’s illustrate why this is a limitation through an example:

Press + to interact

Introduction to the Course

An Introduction to Generative AI

Building Blocks of Deep Neural Networks

Teaching Networks to Generate Digits

Painting Pictures with Neural Networks Using VAEs

Recognize Handwritten Digits Using a Deep Neural Network

Image Generation with GANs

Dataset Augmentation with GANs

Style Transfer with GANs

Assessment: Introduction to Generative AI to Style Transfer

Deepfakes with GANs

The Rise of Methods for Text Generation

Exploring OpenAI API

NLP 2.0: Using Transformers to Generate Text

Composing Music with Generative Models

Generating New Music with Artificial Intelligence

Play Video Games with Generative AI: GAIL

Emerging Applications in Generative AI

Assessment: Deepfakes using GANs to Emerging Applications

Conclusion

Appendix

LSTM Variants and Convolutions for Text

Stacked LSTMs

Bidirectional LSTMs