Speech-to-Text, or STT, is an easy-to-use API powered by Google’s AI technologies to convert speech into text.
Since Speech-to-Text is powered by Google’s own advanced deep learning models, you can expect state-of-the-art accuracy. You can also customize speech recognition to transcribe domain-specific terms and rare words by providing hints and boosting your transcription accuracy of specific words or phrases.
Speech-to-Text can use one of several machine learning models to transcribe your audio file. The API currently offers voice recognition that supports more than languages and variants.
Other than the above-mentioned features, the STT API allows you to:
Speech-to-Text is priced based on the
If you’re interested in how to incorporate Speech-to-Text in your program, check out the course Google Cloud: AI Speech-to-Text with Python 3.