Announcement: The Alexa Skills Community Is Moving To Stack Overflow

For improved usability and experience, Alexa skills related forum support will be transitioned to Stack Overflow. Effective January 10, 2024, the Amazon Developer Forums will no longer be available. For continued Alexa skills support you can reach out to us on Stack Overflow or via Contact Us.

question

Galactoise avatar image
Galactoise asked

Alexa making up new words

So, in doing testing of our renamed invocation, we've seen some weird mappings, but tonight for the first time I saw Amazon make up brand new words. The whole point of renaming was that the dev team said it could only map to real words, so why am I all of a sudden seeing "conkle" and "comelade" show up in the list of misfires? This whole thing is such a rage inducing black box.
alexa skills kitvoice-user interface
10 |5000

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Galactoise avatar image
Galactoise answered
Actually, after some googling it seems like both of those words, "conkle" and "comelade", are names. I'm wondering if this is related to the other weirdness we've been seeing with word matching - https://forums.developer.amazon.com/forums/thread.jspa?threadID=10313&tstart=0 The timing is right, and in that case they appear to have vastly expanded the number of names they are matching (which is now causing a ton of false positives). I wonder if that is true not just of the AMAZON.US_FIRST_NAME slot type but also of the overall voice matching.
10 |5000

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

jjaquinta avatar image
jjaquinta answered
Sounds about right. I found my dump from the Social Security admin. About 20 megabytes of text files. After merging doubles I get at 458K file of female names and a 260K file of male names. I guestimate that to be about 50,000 names. The OED contains about 170,000 words. So if they just threw in the names that's increasing their recognition vocabulary about 3%. Which is HUGE. Really rather a foolish move. (If that's what they did.) Remind me to tell you the story of when they added the Urban Dictionary to Watson's data set...
10 |5000

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Matt Kruse avatar image
Matt Kruse answered
I have found over time that voice recognition is quite annoyingly bad at times. It makes the Echo almost unusable for my 6-yr-old a lot of the time because she doesn't speak so clearly. What I find the Echo lacking is context. It should have a better idea of what someone would and wouldn't ask, and try harder to map against things someone might actually be asking. Actually it probably already does that, but it doesn't feel like it does it very well.
10 |5000

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Rand M avatar image
Rand M answered
> What I find the Echo lacking is context. It should have a better idea of what someone would and wouldn't ask, and try harder to map against things someone might actually be asking. I hadn't thought of this, it is an interesting concept and one that will be certainly be required as we move towards and demand more conversational interaction. I'm not sure if you were referring to the Echo-OS layer or within skills, but it is required by both. Some examples would be that someone might be more likely to ask for the weather/traffic in the morning (before leaving for work). Types of music could matter (morning music could be different than evening music). Purchases might be made for later in the day. Asking for Pizza info (i.e. Yelp integration) would be lunch or later, etc. IFTTT triggers could also use time-of-day info or look at a user's history/patterns to help predict what was said. I'll assume you were talking about skills, she seems pretty good at the OS layer... Within skills, that is interesting - we (devs) can manage the expectations and responses just fine, but rely on Amazon to feed us the right words. Technically speaking, it seems challenging for Amazon to do. Maybe there could be schema/utterance meta-data that could be added to indicate the likelihood of a user asking an utterance based on a previous answer or utterance, or an alternative conversational representation entirely... In any event, good food for thought. Thanks.
10 |5000

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Rand M avatar image
Rand M answered
Fwiw, I'm a big fan of FSMs and would love to see them "built-in" along with, instead of, etc. the schema/utterance structure - that would seem to make it easier for Amazon to try and predict the next phrase of an interaction...
10 |5000

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Ross@Amazon avatar image
Ross@Amazon answered
Can you please include an example of your interaction model?
10 |5000

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Lawrence Krubner avatar image
Lawrence Krubner answered
Rand M -- "I'm a big fan of FSMs and would love to see them "built-in" along with, instead of, etc. the schema/utterance structure " I agree. Dynamic maps for FSM states is currently one of the selling points of converse.ai. I would love to see Amazon offer something like this: http://blog.converse.ai/introduction/conversation-state-conversation-maps/ "Conversation Maps are not just a set list of questions, they can contain junctions to direct conversation flow based on user entries if required, so complex tree like structures can easily be implemented. Neither are they static in their design, unique to Converse is the ability for them to be dynamically rewritten by the service provider, depending on the user’s input and any results obtained. The movie search in the video is a good example of this, as there is an completely empty map to begin with, and the entire set of questions, validators and required entities is dynamically generated based on the options chosen by the user as their query progresses."
10 |5000

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

jjaquinta avatar image
jjaquinta answered
Sounds like VoiceXML. ( https://en.wikipedia.org/wiki/VoiceXML) I always wondered why Amazon did not choose VoiceXML as their interface. I'm sure I'm missing some drawback. But given that there is so much existing work already in VoiceXML, and it would be a no-brainer port to Alexa, it seems there are lots of positives. Shrug. I've toyed with the idea of writing a VoiceXML -> Alexa Skill generator. Then every help desk in the world could easily port themselves to Alexa...
10 |5000

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.