Nowadays, recurrent models used to solve the NER task are much more sophisticated than having just a single embedding layer and an RNN model. They involve using more advanced recurrent models like long short-term memory (LSTM), gated recurrent units (GRUs), etc. We’ll set aside the discussion about these advanced models. Here, we’ll focus our discussion on a technique that provides the model embeddings at multiple scales, enabling it to understand language better—that is, instead of relying only on token embeddings, also use character embeddings. Then, a token embedding is generated with the character embedding by shifting a convolutional window over the characters in the token.

Using convolution to generate token embeddings

A combination of character embeddings and a convolutional kernel can be used to generate token embeddings. The method will be as follows:

  1. Pad each token (e.g., word) to a predefined length.

  2. Look up the character embeddings for the characters in the token from an embedding layer.

  3. Shift a convolutional kernel over the sequence of character embeddings to generate a token embedding.

Get hands-on with 1400+ tech skills courses.