I'm guessing your speech to intent matching is not an exact science. If it works to other similar applications in the industry you are probably internally testing against numerous proposals, defining a confidence for each, and picking the best. Why not expose that to the application? Instead of just returning what it considers the best match, why not return an array of matches, and mark each with a confidence rating? It may be that the top response makes no sense to the app, but a lower confidence response turns out to be correct. I've worked with other speech-to-text services that work this way. Seems like it would be a benefit here as well.
Thanks for the feedback. For the sake of simplicity, we indeed only expose the best match if we think the confidence is good enough. Confidence scores are actually hard to use because it is not a simple and absolute scale. Determining what is good or not good enough requires a lot of data. Your feature request was communicated to the development team. This is definitely something we will consider as a future improvement.
It fills me with despair to realize this conversation happened 7 months ago and there has since been no follow up from Amazon regarding implementing this idea. Ross's response seems borderline comical. Yes, confidence levels are complicated. Those of us who have dealt with Natural Language Processing are already aware of that. It's still a good idea to send us as much data as you can. We are developers, we are not consumers. I understand the need to keep things simple for consumers. I do not understand the need to keep things simple for developers. Actually, I should re-phrase that. In those cases where you can simplify things for developers, without crippling the developers, then simplification is awesome. I'd rather program in Java than in C because I don't want to manually manage memory. Java is simpler than C because it takes care of a problem that I don't want to deal with. However, if I had to work on an embedded system that only had 8k of memory, I would use C. If I actually needed the extra power, I would reach for it. The JVMs automatic garbage collection works well enough for me 99% of the time. I can not say the same for Amazon's voice recognition. In the same way I would switch to C if I really needed it, I wish Amazon would take the same attitude regarding how much data it exposes to us. We (Alexa developers) do need the extra power, because the Amazon voice recognition system does not work well enough on its own.