Search⌘ K
AI Features

Project Creation: Part Two

Explore how to preprocess and pad text sequences for character-level RNNs, then build and understand recurrent neural network models with Keras components such as GRU and TimeDistributed layers. This lesson guides you through preparing numeric data representations and assembling a basic yet extendable NLP model architecture.

Padding

In the previous lesson, we preprocessed our data and created a numeric representation of the test sentences. We will be using the same function to work with our original dataset.

First, we will create the padding functionality.

Python 3.5
import numpy as np
from tensorflow.python.keras.preprocessing.sequence import pad_sequences
def pad(x, length=None):
if length is None:
length = max([len(sentence) for sentence in x])
return pad_sequences(x, maxlen=length, padding='post')
test_pad = pad(text_tokenized)
for sample_i, (token_sent, pad_sent) in enumerate(zip(text_tokenized, test_pad)):
print('Sequence {} in x'.format(sample_i + 1))
print(' Input: {}'.format(np.array(token_sent)))
print(' Output: {}'.format(pad_sent))

Explanation:

  • First, we imported the required packages.

  • From line 4 to line 7, we defined a function that will pad our data. We are trying to find the sequence that is of maximum length. After that, we used the pad_sequences() function to pad extra 0’s at the end by providing the padding="post" parameter and also providing the maximum length of the sequence (which is never going to be more than the maximum length).

  • On line 9 we called the pad() function on the sequences that we created in the previous lesson.

  • Finally, we printed the sequence without padding and the sequence again after padding. Take a look at the output for one of the sequences below.

    Sequence 1 in x
    Input:  [ 4  7  2  1 16 10  5 11 17  1 18  8  3 19 12  1 20  3 21  1 22 10 23 14
    6  1  3 24  2  8  1  4  7  2  1 25 13 26  9  1 27  3 28  1 15]
    Output: [ 4  7  2  1 16 10  5 11 17  1 18  8  3 19 12  1 20  3 21  1 22 10 23 14
    6  1  3 24  2  8  1  4  7  2  1 25 13 26  9  1 27  3 28  1 15  0  0  0
    0  0  0  0  0  0]
    

    You can see that ...