What is a Web Speech API?

Web Speech API

Web Speech API provides two distinct areas of functionality: speech recognition and speech synthesis.

This API provides us with the capabilities to add speech synthesis and speech recognition to our web app.

With this API, we are able to issue voice commands to our web apps the same way we do on Android via its Google Speech or in Windows using Cortana.

Example

Let’s look at a simple example of how to implement text-to-speech and speech-to-text using Web Speech API:

<body>
<header>
<h2>Web APIs<h2>
</header>
<div class="web-api-cnt">
<div id="error" class="close"></div>
<div class="web-api-card">
<div class="web-api-card-head">
Demo - Text to Speech
</div>
<div class="web-api-card-body">
<div>
<input placeholder="Enter text here" type="text" id="textToSpeech" />
</div>
<div>
<button onclick="speak()">Tap to Speak</button>
</div>
</div>
</div>
<div class="web-api-card">
<div class="web-api-card-head">
Demo - Speech to Text
</div>
<div class="web-api-card-body">
<div>
<textarea placeholder="Text will appear here when you start speeaking." id="speechToText"></textarea>
</div>
<div>
<button onclick="tapToSpeak()">Tap and Speak into Mic</button>
</div>
</div>
</div>
</div>
</body>
<script>
try {
var speech = new SpeechSynthesisUtterance()
var SpeechRecognition = SpeechRecognition;
var recognition = new SpeechRecognition()
} catch(e) {
error.innerHTML = "Web Speech API not supported in this device."
error.classList.remove("close")
}
function speak() {
speech.text = textToSpeech.value
speech.volume = 1
speech.rate=1
speech.pitch=1
window.speechSynthesis.speak(speech)
}
function tapToSpeak() {
recognition.onstart = function() { }
recognition.onresult = function(event) {
const curr = event.resultIndex
const transcript = event.results[curr][0].transcript
speechToText.value = transcript
}
recognition.onerror = function(ev) {
console.error(ev)
}
recognition.start()
}
</script>

The first demo, text to speech, demonstrates the use of this API with a simple input field to receive the input text and a button to execute the speech action.

The speak function is shown below:

function speak() {
const speech = new SpeechSynthesisUtterance()
speech.text = textToSpeech.value
speech.volume = 1
speech.rate = 1
speech.pitch = 1
window.speechSynthesis.speak(speech)
}

It instantiates the SpeechSynthesisUtterance() object and sets the text to speak from the text we typed in the input box. Then, we call the speechSynthesis.speak function with the speech object, and it says the text in the input box out loud in our speaker.

The second demo, speech to text, is a voice recognition demo. We tap on the Tap and Speak into Mic button and speak into the mic, and the words we say are translated into letters in the text area.

The Tap and Speak into Mic button, when clicked, calls the tapToSpeak function:

function tapToSpeak() {
var SpeechRecognition = SpeechRecognition;
const recognition = new SpeechRecognition()
recognition.onstart = function() { }
recognition.onresult = function(event) {
const curr = event.resultIndex
const transcript = event.results[curr][0].transcript
speechToText.value = transcript
}
recognition.onerror = function(ev) {
console.error(ev)
}
recognition.start()
}

The SpeechRecognition is instantiated, followed by event handlers and callbacks.

registered.onstart is called at the start of the voice recognition, and onerror is called when an error occurs. onresult is called whenever the voice recognition captures a line.

With onresult callback, we extract the letters and set them into the text area. So, when we speak into the mic, the words appear inside the text area content.


Github Project Link

Free Resources

Attributions:
  1. undefined by undefined