Dataset in Document Classification with ELMo

Learn to classify documents with ELMo, where ELMo generates document embeddings used as inputs to a classification model.

Although Word2vec gives a very elegant way of learning numerical representations of words, learning word representations alone is not convincing enough to realize the power of word vectors in real-world applications.

Word embeddings are used as the feature representation of words for many tasks, such as image caption generation and machine translation. However, these tasks involve combining different learning models, such as CNNs and LSTM models or two LSTM models. To understand the real-world usage of word embeddings, let’s stick to a simpler task—document classification.

Document classification

Document classification is one of the most popular tasks in NLP. Document classification is extremely useful for anyone who is handling massive collections of data, such as those for news websites, publishers, and universities. Therefore, it’s interesting to see how learning word vectors can be adapted to a real-world task such as document classification by means of embedding entire documents instead of words.

Get hands-on with 1200+ tech skills courses.