Search⌘ K
AI Features

CNNs for Sentence Classification: Downloading and Preparing Data

Explore the process of downloading and preparing textual data for sentence classification with convolutional neural networks. Understand how to extract categories from raw data, convert labels to numerical IDs, shuffle datasets, and create validation splits to facilitate effective model training and evaluation.

Implementation: Downloading and preparing data

First, we’ll download the data from the web. The data download functions are provided in the notebook at the end of this lesson and are simply downloading two files: training and testing data (the paths to the files are retained in train_filename and test_filename).

If we open these files, we’ll see that they contain a collection of lines of text. Each line has the format:

<Category>: <sub-category> <question>

There are two pieces of meta-information for each question: a category and a subcategory. A category is a macro-level classification, whereas a subcategory is a finer-grain identification of the type of question. There are six categories available: DESC (description related), ENTY ...