Announcement: The Alexa Skills Community Is Moving To Stack Overflow

For improved usability and experience, Alexa skills related forum support will be transitioned to Stack Overflow. Effective January 10, 2024, the Amazon Developer Forums will no longer be available. For continued Alexa skills support you can reach out to us on Stack Overflow or via Contact Us.

question

R. Labanca avatar image
R. Labanca asked

Slots and utterances too intertwined!

as I think about what I want to do, I can imagine quite a few ways to state utterances. This will generate quite a few combos. For example: I like blue, my favorite color is red, orange is my favorite color Now if color is a slot and you have to give it tons of samples now they're tripled in this simple example. It seems to me the slot syntax should have an expressive syntax to allow those to be permutated there."red|orange|blue" or lists as I've seen suggested. Along with a flag indicating if a list was strict, Same for dates, if you have all these allowed voicings it's painful to think of making samples for all of that, why can't there just be more parameters to the date type in slots to customize it? I can see a learning engine for end users but we're developers and can much more precisely control this if we had the tools. As I think I mentioned in another post, the internal engine must distill these examples down to this kind of thing, that's what I'd like to specify. I will be in need of allowing city names, I can't fathom how I can make a good example set for that! Just making that slot spec allow for a not example wildcard would help that I think. Or allow us to upload databases for certain things like that.
alexa skills kit
10 |5000

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

jjaquinta avatar image
jjaquinta answered
I've hit this problem too. I picked Colossal Cave for an example of wrapping Echo around some interactive fiction. It has a command set that has the same "matrix problem" that you talk about. The way I solved this was to define my terms, and then write a program to generate the file. For example, I distilled it down to verbs and object, which can be combined in simple or compound sentences. For now I just define them statically like this: private static final String[] VERBS = { "get", "drop", "open", .. }; private static final String[][] COMPOUND = { { "unlock", "with" }, { "pour", "on" }, { "give", "to" }, ... }; private static final String[] OBJECTS = { "cage", "bird", "nugget", "... }; Then I just have some simple code to iterate through all combinations: private void generateVerbs() { for (String verb : VERBS) { for (String directObject : OBJECTS) addUtterance(verb, verb+" {"+directObject+"|directObject}"); addIntent(verb, "directObject"); } } private void generateCompoundVerbs() { for (String[] compound : COMPOUND) { String verb = compound[0]; String prep = compound[1]; for (String directObject : OBJECTS) { for (String indirectObject : OBJECTS) { addUtterance(verb, verb+" {"+directObject+"|directObject} "+prep+" {"+indirectObject+"|indirectObject}"); } } addIntent(verb, "directObject", "indirectObject"); } } It was a lot easier than coming up with the 4569 utterances by hand! This approach could be made generic by coming up with a data format for the same. Then a pre-processor to convert that to the raw files. Say: get<.synonym><.synonym>drop<.synonym>... ... unlock with Sure, it would be great to see this in the AppKit. But it looks like they have a lot of fish to fry right now. If this need can be met by a pre-processor for now, this is something that can be community lead. If we get a couple of people to achieve consensus on a format, one of us could write a tool to do the pre-processing suitable for patching into a build script.
10 |5000

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

R. Labanca avatar image
R. Labanca answered
Of course we all are thinking that way, you have to write a generator. However it's like we go from a terse definition thru a generator and echo parses that back down to what's probably a similar terse definition. So I'm really hoping we get some hyping better exposed. But like you I'll have to write a generator, but for large data sets I don't know if I'll hit a limit.
10 |5000

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

R. Labanca avatar image
R. Labanca answered
After experimenting a bit today I can't imagine them not doing a generic wildcard placeholder, and let our apps check for validity. It just makes a lot of sense. Alternately it would make sense to let us define slot types with uploaded databases or provide standards based sources like iso codes etc. I imagine the intelligence it uses now is leveraging some of that now to know how to synthesize things from our samples. It's just odd to me, the utterance sample approach seems like a stopgap trying to make things simple until they can expose a more direct way of specifying syntax. So I'm on the fence if I'm going to write a parser yet. I have to deal with cities and there are many!
10 |5000

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

jjaquinta avatar image
jjaquinta answered
I've been assuming the slots [i]are[/i] wildcard slots. In the Eliza demo, they are speaking, pretty much, free text to the Echo. And it is clearly understanding the words and repeating them back. I asked in another thread how they did that: what intents did they use. What they said was they basically had N number of intents with N number of slots each. I.e. intent #1 with 1 slot, intent #2 with 2 slots, etc. I was designing the intents for my grocery list app today with things like: ADD I need more . Sure, I'll come up with a list of apples and oranges and so forth and generate utterances for them for item. But I expect those are just "hints". That if someone says something else, but in the same format, that it will parse and pass me the word in the slot, even if it isn't explicitly defined in the utterance. That's what I did for my simulator. I had a little evaluation thing so that if it matched the actual word in a slot, that it would rate it a high confidence, but anything that matched the pattern was a possible, but at lower confidence. I don't have an actual Echo to test this out against. But it seems consistent with the demos they've shown.
10 |5000

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.