Whisper is a state-of-the-art automatic speech recognition (ASR) system, a brainchild of OpenAI. It’s purpose-built to transcribe spoken language into written form, a process that has a multitude of uses, encompassing everything from transcription services to voice-controlled assistants. This Answer will shed light on how to use the open-source version of the Whisper ASR system, particularly in Python.
Before understanding the code, it’s important to ensure an optimal environment. This entails having Python installed on your system, as well as the Whisper Python package. The latter can be installed using pip:
pip install -U openai-whisper
In addition, it’s necessary to install FFMPEG, a command-line utility designed to handle multimedia files, including audio and video. Depending on your system’s operating system, it can be downloaded & installed using the corresponding package manager.
# on Ubuntu or Debiansudo apt update && sudo apt install ffmpeg# on Arch Linuxsudo pacman -S ffmpeg# on MacOS using Homebrewbrew install ffmpeg# on Windows using Chocolateychoco install ffmpeg# on Windows using Scoopscoop install ffmpeg
Before we proceed to Python-based usage of Whisper, it’s worth acknowledging that Whisper can be operated directly through the command line as well. This can be a swift and effortless method to transcribe audio files, eliminating the need for writing any Python code. Here’s the command to accomplish this:
whisper "sample.mp3" --task translate --language es --model large
In the above command, we’re utilizing the translate
function of Whisper, specifying the model we wish to employ (in this instance, large
) and the language (es
for Spanish), and providing the path to the audio file we intend to translate.
With your environment configured, you can use the Whisper open source in Python. Below is a basic example illustrating how it can be used to transcribe an audio file:
import whisperimport warningswarnings.simplefilter("ignore")model = whisper.load_model("tiny")result = model.transcribe(audio="/assets/sample.mp3")print(result["text"])
In the provided code snippet, we initially import the Whisper package and load the model. Subsequently, we invoke the model.transcribe
method, passing in the audio file we aim to transcribe. The audio file should conform to a format that Whisper supports, such as WAV, FLAC, or MP3.
The input audio provided to the code snippet above can be found here.
The method returns a result, inclusive of the transcription of the audio file, which is then printed.
Whisper can also facilitate audio translation in other supported languages into English text. Here’s the method to achieve this:
import whisperimport warningswarnings.simplefilter("ignore")model = whisper.load_model("tiny")result = model.transcribe(audio="/assets/sample.mp3", task = 'translate')print(result["text"])
In this instance, we load the tiny
model and invoke the transcribe method with the task parameter set to translate
. This instructs Whisper to translate the audio into English text.
Whisper ASR is a tool for the conversion of speech into text, and its open-source Python package facilitates easy integration into your applications. Regardless of whether you’re creating a transcription service, a voice-activated assistant, or any other application that necessitates speech recognition, Whisper ASR can prove to be a highly valuable resource.
Free Resources