question

Anant avatar image
Anant asked

Getting full user utterance?

I've been enjoying using the Echo and writing an Alexa app a lot! I have a system that is able to parse lots of common ways in which a user can ask questions, and it's pretty inconvenient to have to generate a sample list of utterances. I usually end up leaving my slots very open ended and only include a single prefix; e.g. how {do I make a sandwich} when {does the sun rise today} where {is the nearest coffee shop} Ideally, I would like the Alexa SDK to simply transmit the entirety of what the user said without putting it into specific slots. For instance, if my app's trigger word is "Foo", when the user says "Alexa, ask Foo to ", I would like the full transcription of . Is this sort of low level access to the full user utterance in scope for the SDK? Is it actually possible to do this already? Thanks!
alexa skills kitvoice-user interface
10 |5000

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

jjaquinta avatar image
jjaquinta answered
I think if you have a single instance with a single slot,and one utterance that just has that slot, you will get what you want.
10 |5000

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Nick Gardner avatar image
Nick Gardner answered
Hi, jjaquinta is correct. If you have a single instance with a single slot, and one or a few utterances for just that slot, you will get passed the entire speech data as your intent. -Nick
10 |5000

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Anant avatar image
Anant answered
Thank you, that worked great!
10 |5000

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

John avatar image
John answered
Hey Anant, I was wondering if you could share how you managed to get this to work? I figured this would be one of the most basic requirements and I'm floored that it's turning out this convoluted to just return whatever the user said. Say I have the Utterance: RecordIntent record {word|statement} And the Schema ... { "intent": "RecordIntent", "slots": [ { "name": "statement", "type": "LITERAL" } ] }, ... Anything except the phrase "Record Word" simply makes the little bell noise and will end the session (or reprompt).
10 |5000

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Corporate Greed avatar image
Corporate Greed answered
I was successful with this intent schema and sample utterances. I had to concatenate the populated slot values to reconstruct the user's phrase. Cumbersome, but it seems to work -- at least for up to six-word phrases. { "intents": [ { "intent": "Test", "slots": [ { "name": "itema", "type": "LITERAL" }, { "name": "itemb", "type": "LITERAL" }, { "name": "itemc", "type": "LITERAL" }, { "name": "itemd", "type": "LITERAL" }, { "name": "iteme", "type": "LITERAL" }, { "name": "itemf", "type": "LITERAL" } ] } ] } Test add {a|itema} Test add {a|itema} {b|itemb} Test add {a|itema} {b|itemb} {c|itemc} Test add {a|itema} {b|itemb} {c|itemc} {d|itemd} Test add {a|itema} {b|itemb} {c|itemc} {d|itemd} {e|iteme} Test add {a|itema} {b|itemb} {c|itemc} {d|itemd} {e|iteme} {f|itemf}
10 |5000

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Matt Farley avatar image
Matt Farley answered
I've had pretty good success with free form using the intent/utterance below. The only struggle I have is on Hawaiian names like Kalani and Makaio: --- { "intents": [ { "intent": "DoCommand", "slots": [ { "name": "command", "type": "LITERAL" } ] } ] } --- DoCommand {do something|command}
10 |5000

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Corporate Greed avatar image
Corporate Greed answered
After some experimentation, it seems to me that the secret is that the utterance has to have more than one word in it. For example, {something|command} only returns the last word spoken, while {do something|command} returns the entire phrase.
10 |5000

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

The Stig avatar image
The Stig answered
John, How many sample utterances do you have? It might be helpful to had utterances that have multiple words, that way Echo knows it's possible to have more than one word as the intent.
10 |5000

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Anant avatar image
Anant answered
I use the following intent schema: "intent": "Do", "slots": [ { "name": "Text", "type": "LITERAL" } ] with a set of utterances, both single word and multi word: Do {hi|Text} Do {hello world|Text} Do {my name is|Text} Do {how far away is amsterdam from san francisco|Text}
10 |5000

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

J. Wright avatar image
J. Wright answered
Anant, this may be a really stupid question, but I'm a newbie here. Basic question: is it possible to use this toolkit to process existing audio and or video recordings and produce a transcript?
10 |5000

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.