...

/

A Talking Pictionary: Integrating Text-to-Speech Features

A Talking Pictionary: Integrating Text-to-Speech Features

Learn how to use Vertex AI to add text-to-speech capabilities to our pictionary application.

The text-to-speech feature seemed like a neat feature that can be used in many situations. For us, it can add another layer of interactivity for our pictionary application. For each response that the model generates, we will now speak it out aloud as well. With generative AI, the options are endless!

Testing text-to-speech

Let’s head over to the text-to-speech section of Vertex AI to begin, where we will generate the voice. We can choose from three languages, we can try with “English: Female” for now. To get a feel for how the voice might sound in our application, let’s try it with a response that was generated in the game.

Hmmm, that's a very basic shape! I need more details. Perhaps a handle? Is it a boat? Keep drawing!

Copy the response into the text section of the page and use the “Submit” button to generate the output. Try out various speeds and choose the speed that sounds the best. We have set the speed to 0.75.

Press + to interact
Trying out the text-to-speech feature on Vertex AI Studio
Trying out the text-to-speech feature on Vertex AI Studio

Getting the code

Similar to the other Vertex AI Studio sections, we can get the code for the interactions we perform. Use the “Get code” button to reveal the code. We have added the code in the widget below:

Press + to interact
"""Synthesizes speech from the input string of text."""
from google.cloud import texttospeech
client = texttospeech.TextToSpeechClient()
input_text = texttospeech.SynthesisInput(text="Hmmm, that's a very basic shape! I need more details. Perhaps a handle? Is it a boat? Keep drawing!")
# Note: The voice can also be specified by name
# Names of voices can be retrieved with client.list_voices()
voice = texttospeech.VoiceSelectionParams(
language_code="en-US",
name="en-US-Studio-O",
)
audio_config = texttospeech.AudioConfig(
audio_encoding=texttospeech.AudioEncoding.LINEAR16,
speaking_rate=0.75
)
response = client.synthesize_speech(
request={"input": input_text, "voice": voice, "audio_config": audio_config}
)
# The response's audio_content is binary
with open("output.mp3", "wb") as out:
out.write(response.audio_content)
print('Audio content written to file "output.mp3"')

Let’s see what’s happening in the code: ...