...
/CNNs for Sentence Classification: Building a Tokenizer
CNNs for Sentence Classification: Building a Tokenizer
Learn to build a tokenizer for sentence classification.
We'll cover the following...
Implementation: Building a tokenizer
Now it’s time to build a tokenizer that can map words to numerical IDs:
from tensorflow.keras.preprocessing.text import Tokenizer# Define a tokenizer and fit on train datatokenizer = Tokenizer()tokenizer.fit_on_texts(train_df["question"].tolist())
Here, we simply create a Tokenizer
object and use the fit_on_texts()
function to train it on the training corpus. In this process, the tokenizer will map words in the vocabulary to IDs. We’ll convert all of the train, validation, and test inputs ...