Announcement: The Alexa Skills Community Is Moving To Stack Overflow

For improved usability and experience, Alexa skills related forum support will be transitioned to Stack Overflow. Effective January 10, 2024, the Amazon Developer Forums will no longer be available. For continued Alexa skills support you can reach out to us on Stack Overflow or via Contact Us.

question

Steve A avatar image
Steve A asked

One-shot phrasing accuracy

I've noticed that on some of my skills, the recognition of "one-shot" phrasing ("Alexa, ask [app] how do you say [x]") is significantly worse that "two-shot" phrasing ("Alexa, open [app]. How do you say [x]?"). Has anyone else noticed this? Is there something I should be doing with my interaction model to aid in recognition when using these launch phrases. I've followed the instructions here: https://forums.developer.amazon.com/forums/thread.jspa?messageID=17340䎼 Still, I find that using the one-shot phrasing the recognition is poor, and often the value of [x] is truncated. (Alexa hears one word, often a seemingly arbitrary one, when x consists of several words.)
alexa skills kitvoice-user interface
10 |5000

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

jjaquinta avatar image
jjaquinta answered
Yes, I can confirm it is atrocious. A central part of Starlanes is dropping off and picking up drones. The invocation is of the form: Drop [number] [type] Pickup [number] [type] I have plenty of examples covering basic numbers, and all types. Even when I say it EXACTLY the same is in my utterance file, and it GETS THE WORDS RIGHT, it STILL parses things into the wrong slot. >_< In one case I would say "Drop twenty five attack drones" and it would send me "DROP, number=20, type=five". But that got better when I include sample number utterances with two words. To get around it, I now have complicated state logic My DROP intent gets what it can from the utterance. If it is only partial, it dialogs with the user. [code] U: Drop twenty attack drones A: OK, I got that you want to drop twenty of something. What was it you wanted to drop? U: attack drones A: Check. Drop twenty attack drones [/code] But that's an idealized conversation. In reality it can take up to twelve exchanges for Alexa to get it right. Yet sometimes it gets it right in one. This is a major dent in my game play.
10 |5000

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Nick Gardner avatar image
Nick Gardner answered
Hi, In order to investigate this more thoroughly, do you have any specific examples of utterances for your app which exhibit this unexpected behavior? Thanks, Nick
10 |5000

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

jjaquinta avatar image
jjaquinta answered
I don't know, Nick, if you are asking me or Steve. I supplied many details to the cert team, who neither acknowledged receipt or commented on them. I've crossed posted the specific problems with custom slots to a thread on the debugging forum here. [i]Some[/i] sort of feedback would be... well... professional of you.
10 |5000

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Steve A avatar image
Steve A answered
Nick, It shows up frequently, over a wide range of cases. In fact, someone else mentioned it here the other day: https://forums.developer.amazon.com/forums/thread.jspa?threadID=9346&tstart=0 Currently, I've notice that "one-shot" invocations seem to much more heavily favor words in the utterance samples than "two-shot" invocations. So, if I have the following sample utterances in my animal logging skill (...and yes, that does sound like a terrible idea for an actual skill...) AddAnimal add {dog|Entry} AddAnimal add {tabby cat|Entry} AddAnimal add {california black bear|Entry} then "Alexa, tell animal logger to add dolphin to my log" would have her add "dog". (In fact, almost anything I say would come back "dog") But if I did it the two-shot way, it would be correct, giving me "dolphin." Sorry, I can't be of more help. Like I said, this is just a general, long-run experience I've had. When and how it manifests itself seems to change, but overall, I think it's undeniable that one shot accuracy is worse than two-shot in all the skill that's I've written. I would mention, though, that I haven't found this to be the case in the native skills. So, maybe there's some trick with the sample utterances I haven't figured out.
10 |5000

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Nick Gardner avatar image
Nick Gardner answered
Jjaquinta, I was asking Steven, but your reply was also helpful in getting more information. Apologies for not making that more clear. If you're talking about this thread ( https://forums.developer.amazon.com/forums/thread.jspa?threadID=9409&tstart=0), we're looking into that as well. Simply put, the more specific data we have to look at, the easier and faster these problems will be solved. Right now I don't have any specific feedback to give simply because we have not been able to replicate the problem without specific utterances for a specific skill. Now that I have some, and have passed them onto the relevant teams to take a deeper look into, hopefully we will be able to get an answer soon. Steven, Thanks for the specific utterances, those will hopefully help us track down the issue. I'll post back here once we've been able to look at this in more detail and get you more information. I appreciate everyone's patience on working through some of these issues. Speech and speech recognition is a complicated area and what often seems like a simple fix or change can result in a significant amount of work and time spent. We really do appreciate all the feedback we get on these forums, it is instrumental in making the Alexa Skills Kit a great developer experience to use. Thanks, Nick
10 |5000

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Steve A avatar image
Steve A answered
Nick, Here's a live example of what I mean. On the new skill "Translator for Alexa" say: "Alexa, how do you say 'love' in French?" returns the translation of "lilac" (which I happen to know is a sample utterance.) But, "Alexa, open Translator.....How do you say 'love' in French." returns the proper translation. Note that it doesn't happen with the following: "Alexa, ask Translator to say 'love' in French." returns the correct response, in "one-" or "two-shot" form. So, maybe it's something about the form of the invocation...? Steve
10 |5000

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

jjaquinta avatar image
jjaquinta answered
>Now that I have some... Nick. Yeah. Right. Don't blow smoke up my *censored*. I passed those onto the Cert team some time ago. Word-for-word. Never even got an acknowledgement of receipt. Are you saying that you and the Cert team don't communicate? Given their propensity for asking skill developers to fix Amazon bugs, it's crazy if we have to go through the forum to get bugs opened up against the base system.
10 |5000

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.