Speaker Diarization
Learn how to get tags for each recognized speaker.
Introduction
NOTE: This is a beta feature. It should be considered for testing purposes only.
This feature is helpful when there is more than one speaker and there is a desire to identify each speaker.
- Review the sample code below:
client = speech_v1p1beta1.SpeechClient()
language_code = "en-US"
sample_rate_hertz = 44100
encoding = enums.RecognitionConfig.AudioEncoding.MP3
config = {
"language_code": language_code,
"sample_rate_hertz": sample_rate_hertz,
"encoding": encoding,
}
Download the file below if you wish to hear the audio before processing it through the API.
Diarization configuration
Diaraization settings are held within the key diarization_config
. The value is a dictionary with the following key, value pairs:
Key | Value type |
---|---|
enable_speaker_diarization |
bool |
min_speaker_count |
int |
max_speaker_count |
int |
Challenge
- Make a copy of speech_quickstart_beta.py and name the file