Natural Language Processing with TensorFlow/

...

NER with Character and Token Embeddings

Learn to implement NER with character and token embeddings.

We'll cover the following...

Using convolution to generate token embeddings
Implementing the new NER model
Model training and evaluation
Other improvements we can make
Try it yourself

Nowadays, recurrent models used to solve the NER task are much more sophisticated than having just a single embedding layer and an RNN model. They involve using more advanced recurrent models like long short-term memory (LSTM), gated recurrent units (GRUs), etc. We’ll set aside the discussion about these advanced models. Here, we’ll focus our discussion on a technique that provides the model embeddings at multiple scales, enabling it to understand language better—that is, instead of relying only on token embeddings, also use character embeddings. Then, a token embedding is generated with the character embedding by shifting a convolutional window over the characters in the token.

Using convolution to generate token embeddings

A combination of character embeddings and a convolutional kernel can be used to generate token embeddings. The method will be as follows:

Pad each token (e.g., word) to a predefined length.
Look up the character embeddings for the characters in the token from an embedding layer.
Shift a convolutional kernel over the sequence of character embeddings to generate a token embedding.

Press + to interact

In computing vocab_ser, the first part (i.e., pd.Series(train_sentences).str.split()) will result in a pandas Series object whose elements are a list of tokens (each token in the sentence is an item of that list). Next, explode() will convert the Series of a list of tokens into a Series of tokens by converting each token into a separate item in the Series. Finally, we take only the unique tokens in that Series. Here, we end up with a pandas Series object where each item is a unique token.

We’ll now use the str.len() function to get the length of each token (i.e., the number of characters) and look at the 95% percentile in that. We’ll get the following:

Press + to interact

Introduction to Natural Language Processing

Understanding TensorFlow 2

Word2vec: Learning Word Embeddings

Advanced Word Vector Algorithms

Sentence Classification with Convolutional Neural Networks

Recurrent Neural Networks

Understanding Long Short-Term Memory Networks

Applications of LSTM: Generating Text

Sequence-to-Sequence Learning: Neural Machine Translation

Transformers

Image Captioning with Transformers

Final Remarks

Appendix: Mathematical Foundations and Advanced TensorFlow

NER with Character and Token Embeddings

Using convolution to generate token embeddings