So, we've had our first skill pretty much code complete for about a week, with the only remaining item being that we needed to clean up our sample utterances. We thought this would be a simple task. It has not been. Specifically, the biggest problem we have is that our intents all work great when we've already launched our skill, and are maintaining an active session, but if we try to use the same utterances with the wake word and skill invocation, the rate at which Alexa properly parses our input drops from about 90% to about 25%. A good portion of those 75% of failures don't even return an error response or card - they just produce the sad "off" beep. At this point, we've tried hundreds of small tweaks to our sample utterances, trying to get things to work. We've tried paring our set of intents and utterances way down. We've tried switching fields between custom slot types and literals. Nothing seems to make it better. The really frustrating thing is that we're not colliding with OTHER Alexa functionality. It's not like Alexa is choosing one of its built in capabilities - it is choosing to do nothing at all, rather than invoke our skill. Further exacerbating this, there's absolutely no way to do automated testing of our inputs, as described in other threads. So each time we modify one small thing, hoping it will clear things up, we have to manually walk through a ton of test cases and save results into a spreadsheet, which is incredibly time consuming and not helpful for iterative development. I know we aren't the first to run into this, so how have other teams handled this? We are just about to the point where we're going to put a note in our description saying "always start engaging with us via 'launch' or 'open' requests, and go from there". That feels like it will cheapen our skill, but at this point we are out of ideas.
The majority of my skills are designed to run from launch and only support open invocation as an aside. But this is mainly because I am striving for immersion. I want to get the user in my skill, and keep them interested enough to stay there. They are conversational engines. The launch just starts the conversation and it goes on from there. I haven't had your specific experience, but I've mostly avoided that scenario. I'm not entirely surprised. When you are in your skill, Alexa has to parse the input based on [i]only [/i]your skilll's utterances. When it is in free form mode, it ha to parse input based on [i]EVERYTHING[/i]. The quality of voice recognition increases dramatically the more restricted the grammar. Have you considered changing your invocation phrase? If you moved it from one word to two words that might give it more of a handle to uniquely identify your skill. Or if you change it to a phonetically distinct seldom used word. For example, I have a personal test "read it back to me" skill called "xylene playback". Since "xylene" is in most dictionaries, but isn't something most people say every day, it's pretty good at recognizing it.
Hmmm, good to know that the "launch first, other intents later" approach isn't that abnormal. I was working under the assumption that all three of the required examples would have to work both from a live session and a resting state, and that we'd get dinged in certification otherwise. As for the invocation, yes, we've tried about 10 different combinations, none of which seemed to make it work any better. We changed our phrase to the most commonly misheard value according to our Alexa error cards, but each time we did that it would start mishearing our invocation as something else. It was almost as if it was actively avoiding going into our skill. In the end, our single non-English-word entry performed no better or no worse than English single words or multi-word phrases.