How to use open source Whisper ASR in Python

Share

Whisper is a state-of-the-art automatic speech recognition (ASR) system, a brainchild of OpenAI. It’s purpose-built to transcribe spoken language into written form, a process that has a multitude of uses, encompassing everything from transcription services to voice-controlled assistants. This Answer will shed light on how to use the open-source version of the Whisper ASR system, particularly in Python.

Setting up the environment

Before understanding the code, it’s important to ensure an optimal environment. This entails having Python installed on your system, as well as the Whisper Python package. The latter can be installed using pip:

pip install -U openai-whisper
Install Whisper

In addition, it’s necessary to install FFMPEG, a command-line utility designed to handle multimedia files, including audio and video. Depending on your system’s operating system, it can be downloaded & installed using the corresponding package manager.

# on Ubuntu or Debian
sudo apt update && sudo apt install ffmpeg
# on Arch Linux
sudo pacman -S ffmpeg
# on MacOS using Homebrew
brew install ffmpeg
# on Windows using Chocolatey
choco install ffmpeg
# on Windows using Scoop
scoop install ffmpeg
Install FFMPEG

Whisper through the command line

Before we proceed to Python-based usage of Whisper, it’s worth acknowledging that Whisper can be operated directly through the command line as well. This can be a swift and effortless method to transcribe audio files, eliminating the need for writing any Python code. Here’s the command to accomplish this:

whisper "sample.mp3" --task translate --language es --model large

Code explanation

In the above command, we’re utilizing the translate function of Whisper, specifying the model we wish to employ (in this instance, large) and the language (es for Spanish), and providing the path to the audio file we intend to translate.

Whisper open source in Python

With your environment configured, you can use the Whisper open source in Python. Below is a basic example illustrating how it can be used to transcribe an audio file:

import whisper
import warnings
warnings.simplefilter("ignore")
model = whisper.load_model("tiny")
result = model.transcribe(audio="/assets/sample.mp3")
print(result["text"])

Code explanation

In the provided code snippet, we initially import the Whisper package and load the model. Subsequently, we invoke the model.transcribe method, passing in the audio file we aim to transcribe. The audio file should conform to a format that Whisper supports, such as WAV, FLAC, or MP3.

The input audio provided to the code snippet above can be found here.

The method returns a result, inclusive of the transcription of the audio file, which is then printed.

Translating audio into English text

Whisper can also facilitate audio translation in other supported languages into English text. Here’s the method to achieve this:

import whisper
import warnings
warnings.simplefilter("ignore")
model = whisper.load_model("tiny")
result = model.transcribe(audio="/assets/sample.mp3", task = 'translate')
print(result["text"])

Code explanation

In this instance, we load the tiny model and invoke the transcribe method with the task parameter set to translate. This instructs Whisper to translate the audio into English text.

Conclusion

Whisper ASR is a tool for the conversion of speech into text, and its open-source Python package facilitates easy integration into your applications. Regardless of whether you’re creating a transcription service, a voice-activated assistant, or any other application that necessitates speech recognition, Whisper ASR can prove to be a highly valuable resource.

Copyright ©2024 Educative, Inc. All rights reserved