Announcement: The Alexa Skills Community Is Moving To Stack Overflow

For improved usability and experience, Alexa skills related forum support will be transitioned to Stack Overflow. Effective January 10, 2024, the Amazon Developer Forums will no longer be available. For continued Alexa skills support you can reach out to us on Stack Overflow or via Contact Us.

question

RAko890c avatar image
RAko890c asked

Improving slot passthrough

I'm working on a skill with a fairly limited number of phrases to spark the intent, but then I want to pass nearly anything through in the slot. So far, it corrects what I say to one of the examples I provided in the utterances 9/10 times. The skill let's people start a new tracker, and give the tracker a title. The title can be pretty much anything someone might want to do during their day. A few sample utterances - NewTrackerIntent create tracker named {meeting|TrackerName} NewTrackerIntent make tracker {cook dinner|TrackerName} NewTrackerIntent make tracker named {grocery shopping|TrackerName} NewTrackerIntent make a tracker called {research new technology|TrackerName} NewTrackerIntent make a new tracker named {writing|TrackerName} NewTrackerIntent make a tracker and name it {editing|TrackerName} NewTrackerIntent make new tracker {research performance issues in production|TrackerName} My full utterance file has about 100 in there right now, but still isn't doing any better than when it had 5 utterances. Nearly everything I say gets corrected, usually to "meeting". So - is there an actual way to improve this? Has anyone else had luck with hitting 200 utterances then their skill magically accepting freeform user input?
alexa skills kitvoice-user interface
10 |5000

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Greg Laabs avatar image
Greg Laabs answered
This is just another example that highlights the need for a new slot type called something like ANYTHING or PHRASE or something similar, that will match Alexa's best guess for exactly what you are saying, similar to how it works with the native lists and "simon says". There are tons of examples of third party apps that should allow the user to dictate literally anything.
10 |5000

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Fawn@Amazon avatar image
Fawn@Amazon answered
Hi RAko890c, We have heard the requests for true literal type recognition, and we are certainly working on ways to make things easier for developers! Unfortunately, at this point, the best ways to improve recognition are to: 1) Add as many different variations of wording that you can think of as possible for each intent - you seem to have a good start at this in the examples you posted 2) For each variation you have, make sure you have a good coverage for the slot values to fit into that utterance, including different numbers of word counts. Example: phrase one with {value|SlotA} phrase one with {value one|SlotA} phrase one with {many values|SlotA} phrase one with {a lot of values|SlotA} phrase two with {values|SlotA} phrase two with {many values here|SlotA} phrase two containing {several values|SlotA} phrase two containing {all the values you need|SlotA} etc. There is no magical value that will help to provide you with improved recognition, but the more variety you have for each phrasing, the better the accuracy should be! Best, Fawn
10 |5000

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Bob L avatar image
Bob L answered
I would hope/expect that a 'simple' regular expression engine would be easy to do. It seems that the Echo is clearly able to recognize words and then try to match the to an explicit row in our sampleUtterances file. Why cant it match using regular expressions. This simple RegEx would reduce the numberof entries I have to make from 40+ down to 1. Using regular expression named groups, you can use the group name as the slot name. MyScores (get|tell|) (me|) (the|) (scores|results) from (? (the last|last nights|yesterdays|)) (? (red sox|yankees)) game. This would match: get me the scores from last nights red sox game "last nights=whePlayedIntent", "red sox"=teamNameIntent get the results from yesterdays yankees game scores from yankees game tell me scores from the last red soz game etc. etc. (over 100+ possibilities)
10 |5000

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

jjaquinta avatar image
jjaquinta answered
Most STT systems are not that simple. They do not hear the sound and produce one, definitive, stream of text in isolation. For example the BlueMix Speech To Text service returns a nested phrase containing various alternative readings of the sounds. (See "The interim_results parameter" here: https://www.ibm.com/smarterplanet/us/en/ibmwatson/developercloud/doc/speech-to-text/using.shtml) Amazon have not disclosed what they do, but I have a pretty good guess. Their recognition service probably breaks down the speech into branching streams of phonemes. This is then compared against your utterances. If a stream with a reasonable confidence matches a "known good" text example from the utterances, then that is used. Otherwise it falls back to trying to shoe horn likely words into likely patterns from the utterances. The more constrained (or defined) your vocabulary is, the higher the accuracy. To use regular expressions, it would have to ditch all of the guided interpretation, do a low confidence pure speech to text with no hints, and then do a match. Although regex gives a much better way to match text to text, the tradeoff probably isn't worth it.
10 |5000

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Bob L avatar image
Bob L answered
Would love to get Amazon's take but modern speech recognizers like Google and Apple do in fact create a lexical phrase completely from your spoken voice. (Just watch Google Now voice input to see it build it word by word until it gets your whole sentence correctly parsed) That being said, until such time as Amazon changes, I have created a Windows PowerShell script that will take that Regex format I proposed and it will expand it out into a set of lines for all possible combinations. If you use the Regex named group syntax, those lines will include Alexa slot naming convention. The example utterances at the top of this script block (2 lines) generates 1152 utterance variations I hope other may find this useful for generating deep utterance combinations until such time as they change the way this is done. Those of you familiar with Windows PowerShell scripting should have no trouble with this. An example few lines output from this script is Scores tell me {yesterdays|ForWhen} {recap|Style} for the {national football league|Sport} Scores get {yesterdays|ForWhen} {recap|Style} for the {national football league|Sport} Scores what were {yesterdays|ForWhen} {recap|Style} for the {national football league|Sport} Scores get me {last nights|ForWhen} {recap|Style} for the {national football league|Sport} Scores tell me {last nights|ForWhen} {recap|Style} for the {national football league|Sport} ======== Begin Script Block =========== $intents = @( @{ Intent="Scores"; Utterances= @( "(get me|tell me|get|what were) (? todays|yesterdays|last nights|) (?
10 |5000

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.