Natural Language Processing with TensorFlow/

...

Understanding Neural Machine Translation

Learn the workings of neural machine translation.

We'll cover the following...

Intuition behind NMT systems
NMT architecture

Now that we have an appreciation for how MT has evolved over time, let’s try to understand how state-of-the-art NMT works. First, we’ll take a look at the model architecture used by neural machine translators and then move on to understanding the actual training algorithm.

Intuition behind NMT systems

First, let’s understand the intuition underlying an NMT system’s design. Say we’re fluent English and German speakers and were asked to translate the following sentence into German:

I went home.

This sentence translates to the following:

Ich ging nach Hause.

Although it might not have taken more than a few seconds for a fluent person to translate this, there is a certain process that produces the translation. First, we read the English sentence, and then we create a thought or concept about what this sentence represents or implies in our mind. And finally, we translate the sentence into German. The same idea is used for building NMT systems (see figure below). The encoder reads the source sentence (that is, similar to reading the English sentence). Then, the encoder outputs a context vector (the context vector corresponds to the thought or concept we imagined after reading the sentence). Finally, the decoder takes in the context vectors and outputs the translation in German:

Press + to interact

NMT architecture

Now, we’ll look at the architecture in more detail. The sequence-to-sequence approach was originally proposed by Sutskever, Vinyals, and Le in their paper Sequence to Sequence Learning with Neural Networks Proceedings of the 27th International Conference on Neural Information Processing Systems - Volume 2 (3104-3112) .

From the diagram in the figure above, we can see that there are two major components in the NMT architecture. These are called the encoder and decoder. In other words, NMT can be seen as an encoder-decoder architecture. The encoder converts a sentence from a given source language into a thought vector (i.e., a contextualized representation), and the decoder decodes or translates the thought into a target language.

As we can see, this shares some features with the interlingual machine translation method we briefly talked about. This explanation is illustrated in the figure below. The left-hand side of the context vector denotes the encoder (which takes a source sentence word by word to train a time-series model). The right-hand side denotes the decoder, which outputs word by word (while using the previous word as the current input) the corresponding translation of the source sentence. We’ll also use embedding layers (for both the source and target languages) where the semantics of the individual tokens will be learned and fed as inputs to the models:

Press + to interact

Introduction to Natural Language Processing

Understanding TensorFlow 2

Word2vec: Learning Word Embeddings

Advanced Word Vector Algorithms

Sentence Classification with Convolutional Neural Networks

Recurrent Neural Networks

Understanding Long Short-Term Memory Networks

Applications of LSTM: Generating Text

Sequence-to-Sequence Learning: Neural Machine Translation

Transformers

Sarcasm Classification Using BERT

Image Captioning with Transformers

Caption Generation Using PyTorch

Final Remarks

Appendix: Mathematical Foundations and Advanced TensorFlow

Understanding Neural Machine Translation

Intuition behind NMT systems

NMT architecture