question

headcode avatar image
headcode asked

Slot is null despite phrase being in slot list

I'm adding Alexa support for my OurGroceries shopping list mobile app. These kinds of requests work great: Alexa, ask OurGroceries to add milk to Safeway. Alexa, ask OurGroceries to add milk to groceries. The last slot (LIST_NAME) is the name of the shopping list. The most common shopping list name of my users is simply "shopping list". When I try that, the slot I get is null. (In Java, the Slot object is not null, but its getValue() method returns null.) My list of sample list names in the slot contains 120 entries, the first of which is "shopping list". I first started with a much shorter list of 8 names, and at the time "shopping list" worked. Now with a more representative list, it no longer works. What's odd is that in the Alexa app, in the card, it clearly recognized the "shopping list" term. But by the time that gets to my app, the slot has been cleared. This is a deal-breaker, since that's the most common shopping list name. I could go back to my original list of 8, but since I don't understand why it's breaking now, I don't know if it'll break in the future. Any idea how to solve this? Thanks! Lawrence
alexa skills kitvoice-user interface
10 |5000

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

jjaquinta avatar image
jjaquinta answered
Voice recognition is hard. It's very much not an exact science. In an effort to give the highest quality, Amazon have created an intricate and complex system. One that they are constantly tweaking and trying to improve. The downside of complex systems is that they fail in complex ways. It makes no obvious sense that adding additional categories decreases the accuracy of the original categories. For a simple system. For a complex system, I can see that as one result. Amazon are going to have a whole bunch of internal test cases. Each time they tweak things, they make sure that they haven't perturbed those cases. However, it is a problem set that is hard to cover definitively. Everyone talks differently. Every skill has a different interaction model. Optimizing for their own test cases is going to have near random affects on the other test cases. At least in this case it is your own actions that have lead to a change in behavior. Many of us have been caught out by Amazon changes that broke our skills unnoticed. Unfortunately none of this leads to a recommended course of action. If you want to keep your current design, just try different quantities of things in your list, test repeatedly, and see what works. If you are open to changing your design, you might go with fixed list names. E.g. "list one", "list two", etc. Or "shopping list", "grocery list", "wish list", etc.
10 |5000

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

headcode avatar image
headcode answered
Okay, I get that it's hard, but if it recognizes "shopping list", and "shopping list" is in the slot list, shouldn't that just work? Or is the voice model so complex that it might actually lose some of the entries?
10 |5000

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

jjaquinta avatar image
jjaquinta answered
>but if it recognizes "shopping list"... If you are determining this based on what came up in the companion app... No one really knows where they get the text they display on the companion app. Evidence suggests it is not the same pipeline that ends up going to the skill. Otherwise, have you ever written a spell checker? The way they work is they take the list of valid words and turn them into a tree. Say we just bob, boy, and bad. We would have a tree like this: [code] ....b .../.\ ..o...a ./.\..| y...b.d [/code] (I don't know if that works. Preview is broken.) As you get characters, you traverse the tree. If you get to a terminal node as you reach the end of a word, you pass that words as spelled correctly. Otherwise it is spelled wrong. What I believe Alexa is doing is the same sort of thing. It goes from your voice waveform to a series of phonemes. These phonemes are like letters in the tree. It traverses the structure and when it gets to a series of phonemes that match a word, it considers that word recognized. So, in theory, yes, once it can recognize a word, increasing the data set (and, thus, branches in the tree) shouldn't matter, and it should keep recognizing that word. Reality is rather more complex. First, it doesn't actually turn a sound wave into a phoneme. Rather it turns it into several phoneme candidates, each with a different confidence level. It isn't sure which is exactly right, so it tries them all out on the tree. So it tries them all out on the tree and sees which ones lead to words and which do not. Second, the gap between words in a written sentence is pretty straightforward. The gap between spoken words is less so. Alexa has to make guesses as to when one word ends, and another begins. This, again, creates more branches in interpretation, each with its own confidence level. Third, the tree is not composed solely of what you give in your interaction model. If you have a large model, it can't store all of that. So it merges similar branches. Even if your model is not large, the biases in previous sets are keyed to general experience of the English language. This influences how things get interpreted and can lead to words not in your set. Put this together and you can see how the larger the data set, the more ambiguity is added to the recognition process. More data, in general, trains it to better recognize [i]your type[/i] of data. But not necessarily any one specific line of your data. (Caveat: I have no special knowledge of how Alexa [i]actually[/i] does its work. My comments are always from the point of view of "how I would do it", given a modicum of industry knowledge and experience. What they [i]actually[/i] do is probably even more complex, and proprietary. So I doubt Amazon would ever comment on particulars. I just giving you a "how I would do it" to give some context and understanding to what otherwise appears random or unintelligible!)
10 |5000

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Levon@Amazon avatar image
Levon@Amazon answered
Hi headcode, Welcome to dev forums and thanks for posting! Without looking at the full set of sample utterances and slot values it is hard to tell why this is failing, but my guess would be that that might be the only two word slot value that you have, which is weighting the slotting towards only slotting one word values. You might have more success by a) adding "shopping" to the slot values, b) adding more two word values to the custom slots (e.g. "grocery list", etc.), or c) adding sample utterances that have "list" outside of the slot value, e.g. "add {item} to {ListName} list". Please give it a try or provide more details. Thanks!
10 |5000

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

headcode avatar image
headcode answered
(I've solved it, see below, but I include my original post anyway.) Thank you for replying! Here are some stats: I have 119 shopping list names. Of those, 78 are a single word, 36 have two words, and 5 have three words. a) You suggested adding "shopping" to the list. It's already there. b) You suggested adding two-word values. There are already 36 of them. c) You suggested sample utterances that have "list" outside the slot value. I already have that. Your suggestion (c) made me realize, though, that I also have sample utterances with "shopping list" outside the name, so that I can handle "add milk to costco shopping list". But this might be causing the list name to be null! I tried updating my utterances to remove those that add "shopping list" explicitly, and this caused my test "add milk to shopping list" to return "shopping". Perfect! I can deal with that myself manually, by trying both the slot name and the slot name plus "list". The problem now is that about half the time the item gets added to the Echo shopping list, even if I say "ask OurGroceries" as part of the request. Voice is hard! Thank you Levon!
10 |5000

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Levon@Amazon avatar image
Levon@Amazon answered
Awesome!
10 |5000

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.