...
/Integrating Speech-to-Text with Whisper v3
Integrating Speech-to-Text with Whisper v3
Make the chatbot more responsive by adding voice capabilities using OpenAI’s Whisper v3.
We'll cover the following...
So far, we have a chatbot that works with both text and images. Another type of modality that can be added here is voice. First, let’s focus on updating our chatbot to be able to take voice input from the user.
Taking voice as input
Gradio provides a simple Audio
component that allows us to take audio as input. Let’s add it to a simple demo.
Running this code might open a pop-up in the browser that requests access to the microphone. Please grant access so that the chatbot can hear our voice.
import gradio as gr def process_audio(audio): # This is where we will process the audio return "Audio recorded" with gr.Blocks() as demo: audio_input = gr.Audio(sources=["microphone"]) text_output = gr.Textbox() btn = gr.Button("Process") btn.click(process_audio, inputs=audio_input, outputs=text_output) demo.launch(server_name="0.0.0.0")
The code is simple and should be easy to understand now that we have used Gradio a few times. The only new addition is the gr.Audio
component defined on line 8. It is ...