Web Speech API provides two distinct areas of functionality: speech recognition and speech synthesis.
This API provides us with the capabilities to add speech synthesis and speech recognition to our web app.
With this API, we are able to issue voice commands to our web apps the same way we do on Android via its Google Speech or in Windows using Cortana.
Let’s look at a simple example of how to implement text-to-speech and speech-to-text using Web Speech API:
<body><header><h2>Web APIs<h2></header><div class="web-api-cnt"><div id="error" class="close"></div><div class="web-api-card"><div class="web-api-card-head">Demo - Text to Speech</div><div class="web-api-card-body"><div><input placeholder="Enter text here" type="text" id="textToSpeech" /></div><div><button onclick="speak()">Tap to Speak</button></div></div></div><div class="web-api-card"><div class="web-api-card-head">Demo - Speech to Text</div><div class="web-api-card-body"><div><textarea placeholder="Text will appear here when you start speeaking." id="speechToText"></textarea></div><div><button onclick="tapToSpeak()">Tap and Speak into Mic</button></div></div></div></div></body><script>try {var speech = new SpeechSynthesisUtterance()var SpeechRecognition = SpeechRecognition;var recognition = new SpeechRecognition()} catch(e) {error.innerHTML = "Web Speech API not supported in this device."error.classList.remove("close")}function speak() {speech.text = textToSpeech.valuespeech.volume = 1speech.rate=1speech.pitch=1window.speechSynthesis.speak(speech)}function tapToSpeak() {recognition.onstart = function() { }recognition.onresult = function(event) {const curr = event.resultIndexconst transcript = event.results[curr][0].transcriptspeechToText.value = transcript}recognition.onerror = function(ev) {console.error(ev)}recognition.start()}</script>
The first demo, text to speech, demonstrates the use of this API with a simple input field to receive the input text and a button to execute the speech action.
The speak function is shown below:
function speak() {const speech = new SpeechSynthesisUtterance()speech.text = textToSpeech.valuespeech.volume = 1speech.rate = 1speech.pitch = 1window.speechSynthesis.speak(speech)}
It instantiates the SpeechSynthesisUtterance()
object and sets the text to speak from the text we typed in the input box. Then, we call the speechSynthesis.speak
function with the speech object, and it says the text in the input box out loud in our speaker.
The second demo, speech to text, is a voice recognition demo. We tap on the Tap and Speak into Mic button and speak into the mic, and the words we say are translated into letters in the text area.
The Tap and Speak into Mic button, when clicked, calls the tapToSpeak
function:
function tapToSpeak() {var SpeechRecognition = SpeechRecognition;const recognition = new SpeechRecognition()recognition.onstart = function() { }recognition.onresult = function(event) {const curr = event.resultIndexconst transcript = event.results[curr][0].transcriptspeechToText.value = transcript}recognition.onerror = function(ev) {console.error(ev)}recognition.start()}
The SpeechRecognition
is instantiated, followed by event handlers and callbacks.
registered.onstart
is called at the start of the voice recognition, and onerror
is called when an error occurs. onresult
is called whenever the voice recognition captures a line.
With onresult
callback, we extract the letters and set them into the text area. So, when we speak into the mic, the words appear inside the text area content.
Free Resources