This device is not compatible.
You will learn to:
Load the dataset using Hugging Face Hub.
Load the Whisper model and its preprocessor from the Hugging Face Hub.
Fine-tune the Whisper model.
Compute the word error rate (WER) metric to evaluate the model.
Skills
Deep Learning
Artificial Intelligence
Automatic Speech Recognition
Prerequisites
Basic knowledge of deep learning
Basic knowledge of Hugging Face
Basic theory of speech recognition
Technologies
Python
OpenAI
PyTorch
Hugging Face
Project Description
Automatic speech recognition is the task of transcribing speech from audio. Current speech recognition models only perform well on in-distribution data (data similar to that on which the model was trained). OpenAI introduced the Whisper model in September 2022. The model has been trained on 680,000 Hrs of speech and is very robust to out-of-distribution data. The data comprises transcribed speech in English plus 96 other languages. The model has also been trained on the task of translating transcriptions, where the speech is translated from these 96 other languages and transcribed in English.
In this project, we will start by loading the whisper model from Hugging Face Hub. We will compute the word error rate (WER) metric for the default checkpoint, fine-tune it on a small subset of the “Librispeech” dataset, and then compute its WER again on the test set.
Project Tasks
1
Get Started
Task 0: Introduction
Task 1: Load the Libraries
Task 2: Prepare the Environment
Task 3: Load the Dataset
Task 4: Compute the WER of the Default Model
2
Prepare for Training
Task 5: Display the Loaded Data
Task 6: Prepare the Dataset
3
Training and Evaluation
Task 7: Define a Data Collator
Task 8: Define Evaluation Methods
Task 9: Define Training Configuration
Task 10: Train the Model
Task 11: Evaluate the Model
Task 12: Deploy with Gradio
Congratulations!
Relevant Courses
Use the following content to review prerequisites or explore specific concepts in detail.