Google Gemini for Beginners: From Basics to Building AI Apps/

...

A Talking Pictionary: Integrating Text-to-Speech Features

Learn how to use Vertex AI to add text-to-speech capabilities to our pictionary application.

We'll cover the following...

Testing text-to-speech
Getting the code
Enabling the API
Updating the application

The text-to-speech feature seemed like a neat feature that can be used in many situations. For us, it can add another layer of interactivity for our pictionary application. For each response that the model generates, we will now speak it out aloud as well. With generative AI, the options are endless!

Testing text-to-speech

Let’s head over to the text-to-speech section of Vertex AI to begin, where we will generate the voice. We can choose from three languages, we can try with “English: Female” for now. To get a feel for how the voice might sound in our application, let’s try it with a response that was generated in the game.

Press + to interact

"""Synthesizes speech from the input string of text."""
from google.cloud import texttospeech
client = texttospeech.TextToSpeechClient()
input_text = texttospeech.SynthesisInput(text="Hmmm, that's a very basic shape! I need more details. Perhaps a handle? Is it a boat? Keep drawing!")
# Note: The voice can also be specified by name
# Names of voices can be retrieved with client.list_voices()
voice = texttospeech.VoiceSelectionParams(
    language_code="en-US",
    name="en-US-Studio-O",
)
audio_config = texttospeech.AudioConfig(
    audio_encoding=texttospeech.AudioEncoding.LINEAR16,
    speaking_rate=0.75
)
response = client.synthesize_speech(
    request={"input": input_text, "voice": voice, "audio_config": audio_config}
)
# The response's audio_content is binary
with open("output.mp3", "wb") as out:
    out.write(response.audio_content)
    print('Audio content written to file "output.mp3"')

Introduction to Google Gemini

Capabilities of Gemini

Gemini and Vertex AI

Assess Your Knowledge

Conclusion

Build a RAG Using LangChain with Google Gemini

A Talking Pictionary: Integrating Text-to-Speech Features

Testing text-to-speech

Getting the code