Success Is in Fine Details
In this lesson, we’ll cover the best practices that can lead to high-quality outcomes.
We'll cover the following
How to get the highest quality results
As someone who consults with Fortune 500 companies regularly, I notice that quality outcomes depend on a few best practices:
- Hardware and audio capture technique matters. Beyond what the API can do, there are a lot of things that can be done to improve audio capture. Businesses should consult with an audio engineer.
- Capture audio with a sampling rate of 16,000 Hz or higher.
- To help determine the best configuration, test audio that represents the real world.
- Invest time and money into configuration testing. Skipping this step can result in even more money and time wasted on poor transcription.
- Test at least 1 hour of audio. 3 hours is better, 6 hours is great, and more than that is a case of diminishing returns.
- Pay for professional human transcriptions for WER calculation purposes. Unless you work for a company full of trained transcriptionists, do not roll your own human transcriptions. If professionals have a 5% WER, imagine the errors introduced by everyday workers at your company.
- The API models are trained with raw source audio. There is no need to up sample (convert 8000Hz file to 16000Hz, for example).
- There is no payoff to the conversion of original audio from one encoding to another (MP3 to FLAC, for example).
- There is no need to pre-process the audio to reduce noise or background music, as the models are trained for these situations.
- If identifying separate speakers is critical, capture each audio on a different channel.
Get hands-on with 1400+ tech skills courses.