Whisper is an automatic speech recognition (ASR) system engineered by OpenAI. Its primary purpose is to transcribe spoken language into written text, a capability that has a wide array of uses, ranging from transcription services to voice-controlled assistants. This Answer will help you comprehend how to incorporate the Whisper ASR API in Python, granting you practical knowledge of this tool.
Before you get to the coding part, it's important to configure an appropriate environment. Ensure that your system is equipped with Python, as well as the OpenAI Python client library. The latter can be installed with the help of a pip:
pip install openai
Additionally, you'll need to secure an API key from OpenAI, which serves to validate your requests to the Whisper ASR system.
With your environment ready, you can begin utilizing the Whisper ASR API. The API provides two main functionalities: transcription and translation.
Below is a straightforward example demonstrating how it can be used to transcribe an audio file:
import openaiimport osopenai.api_key = openai.api_key = os.environ["SECRET_KEY"]audio_file= open("/assets/sample.mp3", "rb")transcript = openai.Audio.transcribe(model="whisper-1", file = audio_file, response_format = "srt")print(transcript)
In the provided code snippet, we initially import the OpenAI library and set our API key. Next, we invoke the openai.Audio.transcribe
method, passing in the audio file we aim to transcribe. The audio file must be in a format supported by Whisper, such as WAV, FLAC, or MP3.
The input audio provided to the code snippet above can be found here.
The method yields a response containing the transcription of the audio file, which is subsequently printed.
In case you're facing difficulties in executing the code due to the absence of an API key, below is the result obtained from a previous successful code execution.
[00:00.000 --> 00:06.000] ¿Dónde está la parada del autobús?
Whisper ASR API also supports the translation of spoken language into English. The process is similar to transcription. Here is an example with the same audio file as input:
import openaiimport osopenai.api_key = openai.api_key = os.environ["SECRET_KEY"]audio_file= open("/assets/sample.mp3", "rb")transcript = openai.Audio.translate(model="whisper-1", file = audio_file, response_format = "srt")print(transcript)
Below is the result obtained from a previous successful code execution.
[00:00.000 --> 00:06.000] Where is the bus stop?
If you're dealing with large audio files, it might be necessary to segment them into smaller pieces before feeding them to the Whisper ASR API. This is because the API enforces a restriction on the size of the audio file it can process in a single request, currently 25MB. Audio processing libraries like PyDub or SoX can split your audio files.
Whisper ASR is a tool for translating speech into text, and with its Python API, its integration into your applications is quite straightforward. Whether you're developing a transcription service, a voice-controlled assistant, or any other application necessitating speech recognition, Whisper ASR can be an invaluable resource. Ensure to handle large audio files suitably and always secure your API key.