Speaker Diarization

Learn how to get tags for each recognized speaker.

Introduction

NOTE: This is a beta feature. It should be considered for testing purposes only.

This feature is helpful when there is more than one speaker and there is a desire to identify each speaker.

  1. Review the sample code below:
client = speech_v1p1beta1.SpeechClient()
language_code = "en-US"
sample_rate_hertz = 44100
encoding = enums.RecognitionConfig.AudioEncoding.MP3
config = {
        "language_code": language_code,
        "sample_rate_hertz": sample_rate_hertz,
        "encoding": encoding,
    }

Download the file below if you wish to hear the audio before processing it through the API.

small_two_speaker.mp3

Diarization configuration

Diaraization settings are held within the key diarization_config. The value is a dictionary with the following key, value pairs:

Key Value type
enable_speaker_diarization bool
min_speaker_count int
max_speaker_count int

Challenge

  1. Make a copy of speech_quickstart_beta.py and name the file
...