This device is not compatible.
PROJECT
Text Classification Using PyTorch
In this project, we will learn how to build a deep-learning-based classifier using PyTorch. We will learn about text preprocessing, feature extraction, model selection, training, and evaluation. We will use classical Python NLP libraries such as NLTK and explore traditional machine learning algorithms such as XGBoost in addition to the neural networks.
You will learn to:
Clean and extract features from text.
Build and train machine learning and deep learning models.
Use contextualized embeddings and pretrained language models.
Handle imbalanced data effectively.
Skills
Natural Language Processing
Neural Networks
Machine Learning Fundamentals
Deep Learning
Transformer Models
Prerequisites
Intermediate knowledge of Python programming language
Basic knowledge of pandas library
Basic knowledge of machine learning paradigms and techniques
Basic knowledge of PyTorch framework
Technologies
NLTK
Pandas
XGBoost
PyTorch
Scikit-learn
Project Description
Text classification is a core task in natural language processing (NLP) that involves automatically assigning predefined categories to text documents. It powers applications like sentiment analysis, spam detection, topic classification, and AI-generated content detection.
In this project, you’ll learn how to preprocess text data, extract features, and build multiple machine learning models to classify questions into categories. You’ll start with traditional approaches like Logistic Regression and XGBoost, then move to deep learning models that use TF-IDF features, learned embeddings, pretrained GloVe vectors, and transformer-based architectures like MobileBERT.
By the end, you’ll compare all models side by side to evaluate their accuracy and understand the strengths of each approach, from classical ML to modern neural networks.
Project Tasks
1
Introduction
Task 0: Get Started
Task 1: Import Libraries and Explore Datasets
2
Data Preparation and Basic Feature Engineering
Task 2: Preprocess Text
Task 3: Split the Data
Task 4: Extract Features (BoW)
Task 5: Extract Features (TF-IDF)
3
Linear and Tree Models
Task 6: Train a Linear Model
Task 7: Tune Hyperparameters
Task 8: Train an Ensemble Model
Task 9: Evaluate the Model
4
Neural Networks
Task 10: Define a Neural Network
Task 11: Create Datasets and DataLoaders
Task 12: Set Up Training
Task 13: Train and Evaluate the Neural Network
Task 14: Get Word Embeddings
Task 15: Set Up Training
Task 16: Train and Evaluate the Neural Network
Task 17: Get Embeddings from Pretrained Language Models
Task 18: Set Up Training
Task 19: Train and Evaluate the Neural Network
5
Data Imbalance
Task 20: Handle Imbalanced Data
Task 21: Train and Evaluate the Neural Network
Task 22: Compare Model Performance
Task 23: Save the Neural Network
Congratulations!
Subscribe to project updates
Atabek BEKENOV
Senior Software Engineer
Pradip Pariyar
Senior Software Engineer
Renzo Scriber
Senior Software Engineer
Vasiliki Nikolaidi
Senior Software Engineer
Juan Carlos Valerio Arrieta
Senior Software Engineer
Relevant Courses
Use the following content to review prerequisites or explore specific concepts in detail.