...

/

Generating Embeddings with ELMo

Generating Embeddings with ELMo

Learn to generate embeddings with ELMo and other embedding techniques.

Once the input is prepared, generating embeddings is quite easy. First, we’ll transform the inputs to the stipulated format of the ELMo layer. Here, we are using some example titles from the BBC dataset:

# Titles of 001.txt - 005.txt in bbc/business
elmo_inputs = format_text_for_elmo([
"Ad sales boost Time Warner profit",
"Dollar gains on Greenspan speech",
"Yukos unit buyer faces loan claim",
"High fuel prices hit BA's profits",
"Pernod takeover talk lifts Domecq"
])

Next, simply pass the elmo_inputs to the elmo_layer as the input and get the result:

# Get the result from ELMo
elmo_result = elmo_layer(elmo_inputs)

Let’s now print the results and their shapes with the following line:

# Print the result
for k,v in elmo_result.items():
print("Tensor under key={} is a {} shaped Tensor".format(k, v.shape))

This will print out:

Tensor under key=sequence_len is a (5,) shaped Tensor
Tensor under key=elmo is a (5, 6, 1024) shaped Tensor
Tensor under key=default is a (5, 1024) shaped Tensor
Tensor under key=lstm_outputs1 is a (5, 6, 1024) shaped Tensor
Tensor under key=lstm_outputs2 is a (5, 6, 1024) shaped Tensor
Tensor under key=word_emb is a (5, 6, 512) shaped Tensor

As we can see, the model returns six different outputs. Let’s go through them one by one:

  • sequence_len: The same input we provided that contains the lengths of the sequences in the input.

  • word_emb: The token embeddings obtained via the CNN layer in the ELMo model. We got a vector of size 512 for all sequence positions (i.e., 6) and for all rows in the batch (i.e., 5).

  • lstm_output1 ...