Natural Language Processing with TensorFlow/

...

Dataset in Document Classification with ELMo

Learn to classify documents with ELMo, where ELMo generates document embeddings used as inputs to a classification model.

We'll cover the following...

Document classification
Dataset preprocessing
- Train/test split
Try it yourself

Although Word2vec gives a very elegant way of learning numerical representations of words, learning word representations alone is not convincing enough to realize the power of word vectors in real-world applications.

Word embeddings are used as the feature representation of words for many tasks, such as image caption generation and machine translation. However, these tasks involve combining different learning models, such as CNNs and LSTM models or two LSTM models. To understand the real-world usage of word embeddings, let’s stick to a simpler task—document classification.

Document classification

Document classification is one of the most popular tasks in NLP. Document classification is extremely useful for anyone who is handling massive collections of data, such as those for news websites, publishers, and universities. Therefore, it’s interesting to see how learning word vectors can be adapted to a real-world task such as document classification by means of embedding entire documents instead of words.

Press + to interact

Introduction to Natural Language Processing

Understanding TensorFlow 2

Word2vec: Learning Word Embeddings

Advanced Word Vector Algorithms

Sentence Classification with Convolutional Neural Networks

Recurrent Neural Networks

Understanding Long Short-Term Memory Networks

Applications of LSTM: Generating Text

Sequence-to-Sequence Learning: Neural Machine Translation

Transformers

Sarcasm Classification Using BERT

Image Captioning with Transformers

Caption Generation Using PyTorch

Final Remarks

Appendix: Mathematical Foundations and Advanced TensorFlow

Dataset in Document Classification with ELMo

Document classification

Dataset preprocessing