Generating New Text with the Model
Learn to generate new text with the pretrained LSTM.
We'll cover the following...
Defining the inference model
During training, we trained our model and evaluated it on sequences of bigrams. This works for us because during training and evaluation, we have the full text available to us. However, when we need to generate new text, we don’t have anything available to us. Therefore, we have to make adjustments to our trained model so that it can generate text from scratch.
The way we do this is by defining a recursive model that takes the current time step’s output of the model as the input to the next time step. This way, we can keep predicting words or bigrams for an infinite number of steps. We provide the initial seed as a random word or bigram picked from the corpus (or even a sequence of bigrams).
The figure below illustrates how the inference model works.
Our inference model is going to be comparatively more sophisticated because we need to design an iterative process to generate text using previous predictions as inputs. Therefore, we’ll be using Keras’s functional API to implement the model:
# Define inputs to the modelinp = tf.keras.layers.Input(dtype=tf.string, shape=(1,))text_vectorized_out = lm_model.get_layer('text_vectorization')(inp)inp_state_c_lstm = tf.keras.layers.Input(shape=(512,))inp_state_h_lstm = tf.keras.layers.Input(shape=(512,))inp_state_c_lstm_1 = tf.keras.layers.Input(shape=(256,))inp_state_h_lstm_1 = tf.keras.layers.Input(shape=(256,))# Define embedding layer and outputemb_layer = lm_model.get_layer('embedding')emb_out = emb_layer(text_vectorized_out)# Define LSTM layers and outputlstm_layer = tf.keras.layers.LSTM(512, return_state=True, return_sequences=True)lstm_out, lstm_state_c, lstm_state_h = lstm_layer(emb_out, initial_state=[inp_state_c_lstm, inp_state_h_lstm])lstm_1_layer = tf.keras.layers.LSTM(256, return_state=True, return_sequences=True)lstm_1_out, lstm_1_state_c, lstm_1_state_h = lstm_1_layer(lstm_out, initial_state=[inp_state_c_lstm_1, inp_state_h_lstm_1])# Define a Dense layer and outputdense_out = lm_model.get_layer('dense')(lstm_1_out)# Define the final Dense layer and outputfinal_out = lm_model.get_layer('dense_1')(dense_out)#softmax_out = tf.keras.layers.Activation(activation='softmax')(final_out)# Copy the weights from the original modellstm_layer.set_weights(lm_model.get_layer('lstm').get_weights())lstm_1_layer.set_weights(lm_model.get_layer('lstm_1').get_weights())# Define final modelinfer_model = tf.keras.models.Model(inputs=[inp, inp_state_c_lstm, inp_state_h_lstm, inp_state_c_lstm_1, inp_state_h_lstm_1], outputs=[final_out, lstm_state_c, lstm_state_h, lstm_1_state_c, lstm_1_state_h])
We start by defining an input layer that takes an input having one time step.
Note that we’re defining the ...