...

/

Generating New Text with the Model

Generating New Text with the Model

Learn to generate new text with the pretrained LSTM.

Defining the inference model

During training, we trained our model and evaluated it on sequences of bigrams. This works for us because during training and evaluation, we have the full text available to us. However, when we need to generate new text, we don’t have anything available to us. Therefore, we have to make adjustments to our trained model so that it can generate text from scratch.

The way we do this is by defining a recursive model that takes the current time step’s output of the model as the input to the next time step. This way, we can keep predicting words or bigrams for an infinite number of steps. We provide the initial seed as a random word or bigram picked from the corpus (or even a sequence of bigrams).

The figure below illustrates how the inference model works.

Press + to interact
The operational view of the inference model we’ll be building from our trained model
The operational view of the inference model we’ll be building from our trained model

Our inference model is going to be comparatively more sophisticated because we need to design an iterative process to generate text using previous predictions as inputs. Therefore, we’ll be using Keras’s functional API to implement the model:

Press + to interact
# Define inputs to the model
inp = tf.keras.layers.Input(dtype=tf.string, shape=(1,))
text_vectorized_out = lm_model.get_layer('text_vectorization')(inp)
inp_state_c_lstm = tf.keras.layers.Input(shape=(512,))
inp_state_h_lstm = tf.keras.layers.Input(shape=(512,))
inp_state_c_lstm_1 = tf.keras.layers.Input(shape=(256,))
inp_state_h_lstm_1 = tf.keras.layers.Input(shape=(256,))
# Define embedding layer and output
emb_layer = lm_model.get_layer('embedding')
emb_out = emb_layer(text_vectorized_out)
# Define LSTM layers and output
lstm_layer = tf.keras.layers.LSTM(512, return_state=True, return_sequences=True)
lstm_out, lstm_state_c, lstm_state_h = lstm_layer(emb_out, initial_state=[inp_state_c_lstm, inp_state_h_lstm])
lstm_1_layer = tf.keras.layers.LSTM(256, return_state=True, return_sequences=True)
lstm_1_out, lstm_1_state_c, lstm_1_state_h = lstm_1_layer(lstm_out, initial_state=[inp_state_c_lstm_1, inp_state_h_lstm_1])
# Define a Dense layer and output
dense_out = lm_model.get_layer('dense')(lstm_1_out)
# Define the final Dense layer and output
final_out = lm_model.get_layer('dense_1')(dense_out)
#softmax_out = tf.keras.layers.Activation(activation='softmax')(final_out)
# Copy the weights from the original model
lstm_layer.set_weights(lm_model.get_layer('lstm').get_weights())
lstm_1_layer.set_weights(lm_model.get_layer('lstm_1').get_weights())
# Define final model
infer_model = tf.keras.models.Model(inputs=[inp, inp_state_c_lstm, inp_state_h_lstm, inp_state_c_lstm_1, inp_state_h_lstm_1], outputs=[final_out, lstm_state_c, lstm_state_h, lstm_1_state_c, lstm_1_state_h])

We start by defining an input layer that takes an input having one time step.

Note that we’re defining the ...