Adding Speech Recognition to the Chatbot Using Whisper v3

Make the chatbot more responsive by adding voice capabilities using OpenAI’s Whisper v3.

So far, we have a chatbot that works with both text and images. Another type of modality that can be added here is voice. First, let’s focus on updating our chatbot to be able to take voice input from the user.

Taking voice as input

Gradio provides a simple Audio component that allows us to take audio as input. Let’s add it to a simple demo.

Running this code might open a pop-up in the browser that requests access to the microphone. Please grant access so that the chatbot can hear our voice.

Get hands-on with 1200+ tech skills courses.