chrishoffer avatar image
chrishoffer asked

Is it possible to get the full utterance from user without using literals?

I was hoping to continue using slots with a lot of sample utterances since it is recommended for most accuracy, but I was also hoping to set this up in a way so I can build an audit table of questions asked. Is it possible that I can also receive the full utterance so I can save that for future auditing?

If that is not possible, is it possible to set this up so that I can at least receive the full utterance when it does not match any of my slots?

For example, if I have a sample utterance of "What does {SLOT} mean?" and the user does not say a slot that I have defined, I would still like it to try and send that over so I can audit that. Or, if they ask an utterance that I don't have defined, I would also like to audit that. I'd be obviously willing to use literals for that case if possible.

alexa skills kit
10 |5000

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

You would have to build the auditing on your side in your code. The custom slot will bias towards what you have pre defined, buts it's not an enumeration. So if you don't have that word in your custom slot and it matches the utterance format, Alexa will still send you the word it thinks the user said in the json under the slot (works pretty well in my experience) and you could handle it accordingly on your side to check if it's in a predefined list and log it if not. It does improve the accuracy to have the words you expect the person to say configured for the custom slot, but it'll work pretty well none the less.

0 Likes 0 ·

I'm just getting into developing a custom skill for the Echo, with a bit of experience on the Google Assistant side using API.AI. It's unfortunate that Amazon doesn't provide the full utterance (Google/API.AI does), as there are instances where I want to repeat and/or use what the user said to continue the conversation.

Taking something simple - "I want to order a pizza". "Pizza" would be a slot of type "FOOD", as well as other things like "burger", "coffee", etc. For purposes of backend data retrieval and the sake of argument, "pizza" is sufficient for what I'm doing. Similarly if the user said "I want to order a sausage pizza". The "sausage" part isn't important for the data retrieval, but to confirm to the user "here are the locations where you can get a sausage pizza" would be much nicer than simply saying "here are the locations where you can get a pizza".

With Google/API.AI, I get both - I get the slot match ("pizza") as well as the full utterance ("I want to order a sausage pizza") and then my backend service can determine what to do and send back the appropriate response.

Defining the entire list of possible pizza combinations (as well as coffee, burger, other food types) seems overwhelming and unnecessary.

Has anyone found a way (using CUSTOM slot types) around this limitation?

0 Likes 0 ·

FYI, we are no longer deprecating LITERAL, so you can get whatever the user says to you and handle this in your code along with any custom slot types that you set. See here.

0 Likes 0 ·
jjaquinta avatar image
jjaquinta answered

Amazon does not provide false matches. So there really isn't any way to do what you ask. I think it is somewhat of a privacy concern. If the user just happens to be talking, they don't want skills to be able to "listen in".

Ultimately, the stricter your input set, the more accurate your skill will be. This kind of runs counter to the conversational model that we all aspire to. But you have to balance the two.

10 |5000

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

D. Young avatar image
D. Young answered

Please upvote this enhancement request if you would like to see Amazon provide the full user utterance -

10 |5000

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.