Intro to Utterances, Intents, and Slots

This lesson provides an introduction to basic concepts like utterances, intents and slots.

Utterances, intents, and slots will be some of the most important concepts to know about as an Alexa developer. Let’s take a closer look at some of these concepts.

Wake Words

How do users interact with our Skill? As discussed earlier, the first thing users need to do to start talking to Alexa is call it by name. The default wake word is “Alexa.”

A wake word is a special word or phrase meant to activate a device when spoken.

Invocation Name

The next step is for users to open our Skill. Say, for example, we want to visit a website. We would typically type in the website address into our browser and navigate there. However, we don’t have the convenience of a web browser on a voice-powered device.

For users to open our Alexa Skill, they need to use the Skill’s invocation name. The invocation name is something that we set as a developer. We will see this again when we start on the projects.

An invocation name is a name that the users will use to start interacting with our Alexa skill.

Utterances

There are different ways a user can ask our Alexa skill the same thing. Whatever the user says to or asks Alexa is an utterance.

It’s our job as developers to think of the different ways the user may request something and define them when we build our VUI models. The concept will be clearer when we look at an example.

Everything that the user says or asks Alexa is an utterance.

Intent

Just like an utterance, intent means exactly what it normally does in English.

Our Skill needs to perform an action or respond with an answer to the user’s utterance. The intent is the type of action that Alexa needs to perform upon hearing the user’s utterance. This too is something that we define as a developer in our VUI model.

A fundamental concept to understand is that every utterance maps to an intent, and every intent maps to one or more utterances.

Intent refers to what action Alexa needs to take in response to an utterance.

Slots

Sometimes it’s impossible to enumerate every utterance that the user may use to request a service. For example, suppose we expect the user to tell Alexa a number. Enumerating all numbers from 0 to infinity as possible utterances in the VUI model is not a practical option! This is where slots come in.

Slots are variable inputs provided by the user in an utterance. We can set a slot type in our VUI as an utterance of, say, AMAZON.Number. Then, Alexa will automatically be able to detect that our Skill expects a number. This concept will become much clearer when we get started on the projects!

Slots are variables used in utterances.

Understanding with an example

Let’s dive into these concepts with an example. Suppose we are building a Skill called “Restaurant Finder” that allows users to search for nearby restaurants.

To start using this Skill, users would first enable it from the Alexa Skills Store. Once enabled, the Skill is ready to use.

To start using our Skill, users need to memorize and use its invocation name. They have to say something like “Alexa, open Restaurant Finder.” Here’s a possible script between Alexa and the user:

User: “Alexa, open Restaurant Finder.” **Alexa*8: “Hello! I can help you find nearby restaurants. How can I help you?” User: “Recommend a restaurant within four miles.” Alexa: “There’s China Cafe, located within two miles. It has a very high rating. Would you be interested?”

In this conversation, we see the user using the invocation name to invoke our Skill. After hearing the introductory message about the Skill, they ask Alexa to recommend a restaurant within four miles, which acts as the utterance.

The user could have said this in a variety of ways:

  • “Find me a restaurant within four miles”.

  • “Search a restaurant within four miles”.

  • “Find me someplace to eat within four miles”.

All of these utterances have the same intent, which is to get a recommendation. As Alexa developers, it is our job to think through different possibilities and map intents to as many utterances as possible to better train the VUI models. In a way, we provide Alexa with training data to identify the intent using any of these utterances.

Notice another important piece of information; the user is particular in asking restaurants within “four” miles. Next time, however, this “four” could be anything. The user could ask for restaurants within two miles or “ten” if the user is willing to drive to get some good food.

It wouldn’t be possible for us to enumerate every single distance as an utterance. Wouldn’t it be helpful if something like a variable (like we have in programming) was available? Remember, this is where slots come in! In this circumstance, the “four” is a slot. Alexa extracts this and makes it available to us for use in our business logic.

Let us summarize this with the help of a diagram:

Get hands-on with 1400+ tech skills courses.