Convolutional neural networks (CNN) are incredibly powerful models making immense waves in the realm of machine learning by taking on complex tasks such as image/video recognition and analysis.
Inspired by the
Figures 1 and 2 illustrate a CNN's pipeline, from the input to the output. While Figure 1 details the half responsible for Feature Extraction, Figure 2 details the Classification half of the CNN for an application to be fed images and then determine the presence of koala bears in them.
Now that we're well-versed with CNNs, let's briefly look at the domain we wish to apply them to.
Natural language processing (NLP), at its absolute minimum, is what happens when a computer is tasked with listening to words and sentences in a natural language. This is referred to as
Using both NLU and NLG, there are a host of applications for NLP. Some of these include the following:
Note: For more information on NLP, refer to this Answer
The CNN illustrated in Figures 1 and 2 for our koala detection application works incredibly well, but currently, inputs are provided in the form of images.
How do we go from CNNs that work for images to CNNs that work for NLP?
For starters, the inputs to the CNN for NLP will be in the form of sentences, not images. However, we can notice from Figure 1 that the input type does not affect the illustrated methodology as long as we can convert it into an array of numbers for feature extraction and classification.
Fortunately, sentences can be represented as an array of vectors, wherein each word of the sentence—known as a token—is a vector in a vector space comprising the entire training vocabulary. This is known as Word Embeddings.
Given this array of vectors, we can duplicate the pipeline in Figures 1 and 2, while making changes to the filters and pooling operations appropriate to the NLP application we'd like to perform.
A diagram illustrating the pipeline for