...

/

Integrating Speech-to-Text with Whisper v3

Integrating Speech-to-Text with Whisper v3

Make the chatbot more responsive by adding voice capabilities using OpenAI’s Whisper v3.

So far, we have a chatbot that works with both text and images. Another type of modality that can be added here is voice. First, let’s focus on updating our chatbot to be able to take voice input from the user.

Taking voice as input

Gradio provides a simple Audio component that allows us to take audio as input. Let’s add it to a simple demo.

Running this code might open a pop-up in the browser that requests access to the microphone. Please grant access so that the chatbot can hear our voice.

import gradio as gr

def process_audio(audio):
    # This is where we will process the audio
    return "Audio recorded"

with gr.Blocks() as demo:
    audio_input = gr.Audio(sources=["microphone"])
    text_output = gr.Textbox()

    btn = gr.Button("Process")
    btn.click(process_audio, inputs=audio_input, outputs=text_output)

demo.launch(server_name="0.0.0.0")
A simple audio input demo with Gradio

The code is simple and should be easy to understand now that we have used Gradio a few times. The only new addition is the gr.Audio component defined on line 8. It is ...