Building Advanced Deep Learning and NLP Projects/

...

Project Creation: Part Two

In this lesson, we will perform some preprocessing on our dataset.

We'll cover the following...

- Padding
- The preprocess pipeline on our original data
- Importing the packages

Press + to interact

import numpy as np
from tensorflow.python.keras.preprocessing.sequence import pad_sequences
def pad(x, length=None):
    if length is None:
        length = max([len(sentence) for sentence in x])
    return pad_sequences(x, maxlen=length, padding='post')
test_pad = pad(text_tokenized)
for sample_i, (token_sent, pad_sent) in enumerate(zip(text_tokenized, test_pad)):
    print('Sequence {} in x'.format(sample_i + 1))
    print('  Input:  {}'.format(np.array(token_sent)))
    print('  Output: {}'.format(pad_sent))

Explanation:

First, we imported the required packages.
From line 4 to line 7, we defined a function that will pad our data. We are trying to find the sequence that is of maximum length. After that, we used the pad_sequences() function to pad extra 0’s at the end by providing the padding="post" parameter and also providing the maximum length of the sequence (which is never going to be more than the maximum length).
On line 9 we called the pad() function on the sequences that we created in the previous lesson.

Finally, we printed the sequence without padding and the sequence again after padding. Take a look at the output for one of the sequences below.

Sequence 1 in x
Input:  [ 4  7  2  1 16 10  5 11 17  1 18  8  3 19 12  1 20  3 21  1 22 10 23 14
6  1  3 24  2  8  1  4  7  2  1 25 13 26  9  1 27  3 28  1 15]
Output: [ 4  7  2  1 16 10  5 11 17  1 18  8  3 19 12  1 20  3 21  1 22 10 23 14
6  1  3 24  2  8  1  4  7  2  1 25 13 26  9  1 27  3 28  1 15  0  0  0
0  0  0  0  0  0]

You can see ...

Welcome to the Course

Project: Build a COVID-19 Detection System Using X-Rays

Project: Building a Pokemon Classifier Using Transfer Learning

Project: Text Generation Using Markov Chains

Word Embedding: Two Mini Projects

Project: IMDB Reviews Sentiment Analysis

Project: Deciphering Text Using Character-Level RNNs

Project: Emoji Predictor Using Transfer Learning in NLP

Final Exam

Where to Go Next?

Project Creation: Part Two

Padding