question

Jason Priebe avatar image
Jason Priebe asked

Best Practices: LaunchRequests vs IntentRequests?

I'm unsure of exactly how to design the voice UI when it comes to LaunchRequests vs IntentRequests. Amazon's skill reviewer told me that my UI should be more conversational when it is opened via a LaunchRequest. The skill should ask the user for the specific piece of info he wants, and not expect the user to speak the whole phrase he might use in an IntentRequest. Here is an example IntentRequest: [i]User> Alexa, ask SKILLNAME what is the average review for Product X Alexa> The average review for product X is 4.[/i] LaunchRequest: [i]User> Alexa, open SKILLNAME Alexa> Welcome to SKILLNAME, I can look up reviews for Amazon products. What product are you interested in? User> Product X Alexa> The average review for product X is 4.[/i] So I have the following questions: - I guess I have to define an intent that will work when the user says "what is the average review for Product X" (for the IntentRequest use case) as well as an intent that will work when the user says "Product X" (for the LaunchRequest use case). Then I have to use some rather tedious back-end logic with session variables to make sure that the user doesn't just say "Alexa, ask SKILLNAME Product X" to fire off an IntentRequest, and conversely, that he doesn't say "what is the average review for Product X" when Alexa prompts "What product are you interested in?". Am I missing the point? - How to best handle yes/no interactions? My skill reviewer suggested that I add a yes/no question to the conversation. The only way I can think to handle that is to create a YesIntent and a NoIntent. And then using session variables, again, I determine whether the skill is expecting a yes/no answer, and if the user speaks "yes" at an inappropriate time, the skill responds with an error. This seems janky at best, and it scales poorly for a complex UI that might have 4 or 5 different yes/no questions scattered throughout the user interaction. - How to handle help messages? Since the user interacts with the skill in two distinct modes, what should my help message say? My skill reviewer suggested adding a prompt to the end of the help message: [i]User> Alexa, open SKILLNAME Alexa> Welcome to SKILLNAME, I can look up reviews for Amazon products. What product are you interested in? User> help Alexa> You can say things like "Alexa, ask SKILLNAME what is the average review for Product X". What product are you interested in?[/i] So now, my help has explained how to start the skill with an IntentRequest, but it is then prompting the user for a simple product name response. That seems contradictory. I just don't know how to reconcile the help text with the two modes. I could say something horribly awkward like [i]User> help Alexa> You can launch SKILLNAME and make a query by saying something like "Alexa, ask SKILLNAME what is the average review for Product X". Or you can launch SKILLNAME by saying "Alexa, open SKILLNAME", at which point, SKILLNAME will prompt you to specify a product name. What product are you interested in?[/i] But that seems ridiculous. Thanks for indulging me and reading through this long and complex set of questions. I would love to hear how other developers have dealt with this, or if I could get some guidance from the Alexa team, that would be great, too!
alexa skills kit
10 |5000

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Jason Priebe avatar image
Jason Priebe answered
Before anybody gets too upset, I apologize for not posting this on Voice User Interface Design instead of Getting Started.
10 |5000

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

jjaquinta avatar image
jjaquinta answered
I've got a lot of tips on UI design in my book, http://www.amazon.com/How-Program-Amazon-Echo-Development-ebook/dp/B011J6AP26 But to your specific questions... >Amazon's skill reviewer told me that my UI should be more conversational when it is opened via a LaunchRequest. They like to see both. A new user isn't going to, off the bat, know how to use your skill. In UI design that's called "discovery". On a GUI, there are many conventions for aiding discovery: color, hover help, right click, context help, etc. You don't have that with Alexa. You just have conversation. >I guess I have to define an intent that will work when the user says >"what is the average review for Product X" (for the IntentRequest use case) >as well as an intent that will work when the user says "Product X" (for the LaunchRequest use case). You should be able to do this with one intent (PRODREVIEW) with a single slot (prod). Then have multiple utterances: PRODVIEW what is the average review for {blah|prod} PRODVIEW review {blah|prod} PRODVIEW {blah|prod} This should match the use cases you define with just a single intent. >How to best handle yes/no interactions? You've got the right approach. You need a state machine. Anything more than a trivial skill will end up needing a state machine. Just think of the most common audio interface you deal with: a teleprompt on the phone. >if the user speaks "yes" at an inappropriate time, the skill responds with an error. Since Alexa doesn't allow us to enable/disable intents, this is what you do. (Go +1 my feature request for that!) But that can also be handled by your state machine. "I don't understand what you mean in this context." is the response I use. >it scales poorly for a complex UI that might have 4 or 5 different yes/no questions scattered throughout the user interaction. I disagree. State machines are used to parse all the programming languages we use. They are way more complex than any conversation we might have to deal with. I managed to implement Starlanes (which the Alexa Skills Team said was the most complex skill they had seen to date) with only about five states. When you get down to it, you usually end up needing less states than you think. >How to handle help messages? Since the user interacts with the skill in two distinct modes, what should my help message say? Well, if you know what mode the user invoked the skill in, you can respond with the help for that mode. Otherwise, randomly pick one of the two modes and describe that. Just make sure at the end of your help text that your tell them how to get help for the other mode. What I did in Starlanes was to have a help intent, with lots of slots for each thing you could get help on (about 16 topics). Then, in the reprompt, I would suggest two related topics and how they could get help on those. >Thanks for indulging me and reading through this long and complex set of questions. No problem. I would encourage you to read our book. It goes into a lot more detail than I can cover here. And it's only $1. >if I could get some guidance from the Alexa team, that would be great, too! You can always respond to the e-mail that sent the feedback. That goes to the reviewers and you can have a back and forth conversation with them. You can also request to have a conference call with them if you prefer that means of communication. My experience with them has been pretty poor. They are reviewing based on a check list and aren't really subject matter experts. They know more about the intrinsic Echo skills and there have been several circumstances where they asked for things that you just couldn't do with the ASK. Your mileage may vary.
10 |5000

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Jason Priebe avatar image
Jason Priebe answered
Thanks for the well-thought-out reply. I've gone ahead and purchased your book -- I look forward to sitting down to read it this week. Two follow-ups: - if I set up my intents like this: PRODVIEW what is the average review for {blah|prod} PRODVIEW review {blah|prod} PRODVIEW {blah|prod} then the user can just say "Alexa, ask SKILLNAME product", and it will fire the PRODVIEW intent. But the wording is unnatural and grammatically incorrect, so I wouldn't want to encourage the user to invoke the skill that way. So I suppose I just need to filter this use case out via my state machine, right? - I'm still a little confused about the help text. When the user invokes the app with no intent and then says "help", my app gives him some sample utterances he can use to invoke the app *with* an intent, and then prompts him with "What product are you interested in?", expecting the user to just answer with the simple "{blah|prod}" utterance. It just seems odd for me to give the user full-intent samples like "Alexa, ask SKILLNAME what is the average review for {blah|prod}" but then ask him to do something totally different: just supply the name of a product. Maybe what I'm getting at is this: is it my skill's responsibility to explain to the user the two different modes of interoperating with a skill? Or can I assume that the user already knows how to invoke skills with no intent and also how to invoke them with full intent ? Trying to spell that out via a voice interface would quickly become tedious for the listener, I think. But it does seem like if a user invokes the skill with no intent and then asks for help, I should try to teach him how to invoke the skill *with* an intent (because it's so much more efficient to use it that way).
10 |5000

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

jjaquinta avatar image
jjaquinta answered
>the user can just say "Alexa, ask SKILLNAME product", and it will fire the PRODVIEW intent. >But the wording is unnatural and grammatically incorrect, so I wouldn't want to encourage >the user to invoke the skill that way. Remember, the user has no idea what your interface is. One of the big headaches of audio design is discoverability. (There's a whole chapter in my book about it!) In this case, though, it works to your advantage. If you don't tell them they can invoke it that way, they are unlikely to stumble across it by accident. >I just need to filter this use case out via my state machine, right? First of all, you don't. Second of all, you couldn't if you wanted to! All your state machine knows is that PRODVIEW was invoked with "blah" in the slot named "prod". It doesn't know which utterance invoke it. Alexa does the heavy lifting of parsing the input speech. You just need to react to what's been invoked. Think of your intent as a function call, with the slots as arguments. The utterances define how to call the functions based on what the user says. But you don't have to worry about how that happens. You just have to respond to the function calls. >It just seems odd for me to give the user full-intent samples like "Alexa, ask SKILLNAME what is the average review for {blah|prod}" >but then ask him to do something totally different: just supply the name of a product. Well, if you think the primary interface with your skill is going to be one-off queries, then what you can do is give the help, then drop the session. Now they have to re-invoke it from the start with the "Alexa, ask..." intent. It's a question of which style suits your skill the best. Personally, I tend to go for the more interactive skills. But I think Amazon's main design intent was once-off invocation ones. (The review team keeps trying to get me to make my skills work more from once-off invocation mode, even when it doesn't suit them!)
10 |5000

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.