Embedding Layer: Word Representation

In this lesson, we are going to learn about embedding layers.

Problem with previous word representations

We already saw that one simple way to create a numeric representation of the text is to create a vocabulary of our dataset and assign a number to each word. However, this has one major problem: there will be too many sparse vectors. This means that, after padding, we could have many vectors that will contain almost more than half of the values as 0s.

Therefore, we will need a way to represent the vectors in a compact form; this representation is called Embedding Distributed Representation.

  • This representation is very powerful (if you have a large dataset).
  • Using the vocabulary approach is not a good method to use for deep learning based models.

In the next lesson, we are going to start our project. There we will be using embedding layers and will see how we can convert the vocabulary based vectors to compact vectors.

What is an embedding layer?

It is basically a big matrix of size (vocabulary, K) where K represents the length of the vector. For example, we discussed GloVe in previous chapters; here the value of K would be 300.

How does an embedding layer work?

An embedding layer is just a matrix as we discussed above. Initially, we passed the input directly to the RNN cell, but now we will pass our input to the embedding layer, and the output from this layer will be fed into the RNN cell. Look at the simple architecture below to get a better understanding.

Get hands-on with 1300+ tech skills courses.