This device is not compatible.

Text Classification Using PyTorch

PROJECT


Text Classification Using PyTorch

In this project, we will learn how to build a deep-learning-based classifier using PyTorch. We will learn about text preprocessing, feature extraction, model selection, training, and evaluation. We will use classical Python NLP libraries such as NLTK and explore traditional machine learning algorithms such as XGBoost in addition to the neural networks.

Text Classification Using PyTorch

You will learn to:

Clean and extract features from text.

Build and train machine learning and deep learning models.

Use contextualized embeddings and pretrained language models.

Handle imbalanced data effectively.

Skills

Natural Language Processing

Neural Networks

Machine Learning Fundamentals

Deep Learning

Transformer Models

Prerequisites

Intermediate knowledge of Python programming language

Basic knowledge of pandas library

Basic knowledge of machine learning paradigms and techniques

Basic knowledge of PyTorch framework

Technologies

NLTK

Pandas

XGBoost logo

XGBoost

PyTorch

Scikit-learn

Project Description

Text classification is a fundamental task in natural language processing (NLP) that aims to categorize text documents into predefined classes or categories automatically. It has numerous real-world applications, such as sentiment analysis, spam detection, topic classification, customer feedback analysis, and currently, classifying text as generated by an AI model or not.

In this project, we’ll practice preprocessing text data, extracting meaningful features, and training machine learning models to perform classification. Specifically, we’ll build a question classifier. The project emphasizes the use of neural networks, including pre-trained language models, while also providing an introduction to traditional machine learning techniques. We’ll use popular Python NLP libraries and frameworks like NLTK, scikit-learn, and PyTorch.

Project Tasks

1

Introduction

Task 0: Get Started

Task 1: Import Libraries and Explore Datasets

2

Data Preparation and Basic Feature Engineering

Task 2: Preprocess Text

Task 3: Split the Data

Task 4: Extract Features (BoW)

Task 5: Extract Features (TF-IDF)

3

Linear and Tree Models

Task 6: Train a Linear Model

Task 7: Tune Hyperparameters

Task 8: Train an Ensemble Model

Task 9: Evaluate the Model

4

Neural Networks

Task 10: Define a Neural Network

Task 11: Create Datasets and DataLoaders

Task 12: Set Up Training

Task 13: Train and Evaluate the Neural Network

Task 14: Get Word Embeddings

Task 15: Set Up Training

Task 16: Train and Evaluate the Neural Network

Task 17: Get Embeddings from Pretrained Language Models

Task 18: Set Up Training

Task 19: Train and Evaluate the Neural Network

5

Data Imbalance

Task 20: Handle Imbalanced Data

Task 21: Train and Evaluate the Neural Network

Task 22: Save a Neural Network

Congratulations!

has successfully completed the Guided ProjectText Classification Using PyTorch

Relevant Courses

Use the following content to review prerequisites or explore specific concepts in detail.