The request we get from 100% of users is the need to remove "tell " from the syntax. Our app is used very often and for almost all of those users, the app will be used as the primary app. The syntax the users want is "Alexa, ". Realizing that it could get messy to put every app into the pole position for syntax, I think perhaps the most reasonable path to address this is to allow designating one particular app as primary. Or perhaps more than one if there was some conflict resolution. Conflict resolution with only one primary app would be trivially easy as part of app review. Users see the "Connected Home" section which operates without a "tell " prefix and that becomes a "why didn't you do it that way!" cry even though that wasn't possible in the API. That only has a few devices in it so isn't realistic for people using broader home control. An option to put a particular skill into a primary syntax position would provide that.
This is by far the cockiest feature request I've ever seen (for any product, ever). Lets break it down: -You assert that 100% of your users are interacting with your feedback mechanism. I find this statistically improbable. -You assert that the opinions of your users are absolutely uniform and well aligned. The idea that every person who uses your product is in agreement about ANYTHING is preposterous, unless you have an incredibly small user base (in which case, why would anyone develop a feature just for you?) -You assert that for almost all of your users, specifying the app name is problematic, yet if you look at reviews in the skill store, one of the biggest recurring complaints is when a skill does something without the user explicitly requesting it. -You assert (and this is the biggest leap of all) that almost all of your users only care about your app. Which is to say that this product space, which has blown up over the last year into a media darling, selling out its associated hardware and garnering no shortage of buzz, is completely irrelevant to your users, either because your users are anomalous in relation to the rest of the population, or because you have invented something so earth-shattering as to completely negate the need for any other software to interact with. I'm not buying it. Ridiculous. You need to check your hyperbole, bro. Lucky for you, however, you didn't "invent" the idea in this feature request, either - it's a common request, and one that a lot of the Alexa top brass (Frederic Deremat, David Isbitski, etc) are regularly asked about in public forums and regularly give the same response - something like "We recognize the current model is clunky, we too would like to see something more natural, but it's a non-trivial problem to solve. We're taking steps in the near term toward that end, though, so watch for new features." and so on, and so forth.
I think -- just to read the post a bit more charitably -- the OP means that 100% of the feedback he gets asks for a more natural invocation. I'm also guessing he's not saying that everyone who uses the skill only cares about that skill. Again, guessing, his claim is meant to be read with more limited scope, e.g., everyone that is using his skills for some purpose (telling a jock, setting a thermostat, etc) is using only his skill to do that. I think it would be nice if Amazon leveled the playing field between built-in functionality and 3rd party skills about allowing users to designate a 3rd party skill to respond to "built in" invocation request. For example, "Tell me a joke" always gives Amazon jokes. But given that there are a ton of joke skills, why shouldn't the user be able to assign one of the many "yo mamma" skills to respond to "Tell me a joke?" Users do hate the "Alexa, tell x to y" syntax. The idea of designating a primary skill would a) ameliorate that problem for the user's favorite skills and b) help level the playing field for 3rd party devs.
"I think -- just to read the post a bit more charitably -- the OP means that 100% of the feedback he gets asks for a more natural invocation." You're right, but what they don't seem to realize is that Alexa can't read their minds. The only way Alexa can know which skill the user wants to use is for the user to ask for it by name, especially since there's so much overlap in utterances used across various skills. This is not an illogical or unreasonable requirement for a voice-activated system, but consumers don't have much experience with voice-activated systems so it's not something many of them seem to "get" right away. I think it would be a mistake to let users designate a single, default skill that doesn't require invocation by name, because people being people (and I include myself in this) it's only a matter of time before they: 1. Forget the special designation [i]only[/i] removes the invocation by name requirement for the one, specially-designated skill and jump to the wrong conclusion that overall Alexa software is broken -or- 2. Forget which skill they designated as the default and then wrongly assume this special default designation thing doesn't work -or- 3. Demand to know why invocation by name is [i]still[/i] a requirement for all other skills. In any of these three scenarios, Amazon loses points and face with consumers for reasons that are not their fault. It's typical for people to be irritated by new tech that requires new habits, but seriously, how hard is it to understand and eventually accept that invocation by name is necessary when there's more than one skill available?
You raise a number of good points. And I do think a lot of the difficulty is, as you say, people getting used to a new way of interacting with technology. And, the more I think about it, the more I see the implementation wouldn't be trivial. But I was thinking that in the Alexa app, they might eventually have categories: weather, humor, radio, home automation, etc, and you could pick among the listed apps which will respond to default commands, "tell me a joke..." It's a model we have on phones and computers (default apps to open certain documents, or get us the weather). It does seem like it *could* work with a voice interface. Steve
Although I think it is ridiculous to do this now, I also see that as a point-in-time statement. Currently, in our own skills, we have to construct an audio interface that is unambiguous, and matches a set of inputs to a set of functionality without context. This is hard enough to do as it is. StarLanes is kind of at the edge of what is consumable with the present technology. It has been difficult adding stuff, because it introduces ambiguity. This is what makes it ridiculous to expect all 3rd party skills to interact at a "root" level. It takes what is already a difficult problem, and magnifies it a hundred fold, by developers who are continually adding and not coordinating with each other. But over time I expect that Amazon will begin to bring in context. Probably not the strict state machine approach of VoiceXML, but something more of a hybrid. The end effect will be to reduce the recognition problem set. If only certain parts of your voice model are "active" at one time, then it can resolve against it cleaner. Extend this concept further, and add the ability to distinguish entrance contexts. Once you can narrow this down you are start to look at how to algorithmically combine sets of entrance contexts, identify points of ambiguity, and how to reject or resolve them. The Echo product team is probably already working on this. If you think about it, they already have a growing set of always-active native skills. They need to solve the problem, then come up with how to make it generically applicable. It's not a simple problem. But I think it will be solved in time. Just not soon.