question

Matt Kruse avatar image
Matt Kruse asked

Proposal: Standard schema/utterance meta-data structure

Many developers want to auto-generate utterances and schema based on some format that contains both, and auto-expands some format into hundreds or thousands of sample utterances. There are several projects underway that do this now, so I'm proposing a standard JSON-based meta-data structure that can be used to define schema and utterances. This structure could be processed and handled by any framework or tool that wants to offer this functionality, and allow developers to switch between tools if desired. Below is what I came up with, admittedly based on what I've already done in my alexa-app framework. Any thoughts on this? Is anyone else interested in a common format? [u][b]Alexa Skill Schema Example:[/b][/u] [code] { "dictionary": { "colors":["red","green","blue"] ,"names":["Bob","Mary","Joe"] } ,"intents": { "nameIntent": { "slots": { "name":"LITERAL" } ,"utterances": [ "My name{'s|is} {names|name}" ,"I call myself {names|name}" ] } ,"ageIntent": { "slots": { "age":"NUMBER" } ,"utterances": [ "I{'m| am} {1-110|age}{| years old}" ] } ,"colorIntent": { "slots": { "color":"LITERAL" } ,"utterances": [ "my favorite color is {colors|color}" ] } ,"combinedIntent": { "slots": { "name":"LITERAL" ,"color":"LITERAL" ,"age":"NUMBER" } ,"utterances": [ "my name is {names|name} and I{'m| am} {1-110}{| years old} and my favorite color is {colors|color}" ] } } } [/code] [u][b]Explanation:[/b][/u] * The "dictionary" is a set of possible values for a given key, which can be used in multiple utterances * The intent schema required by Amazon can be generated from the schema * The list of example utterances can be generated from the schema * Expansion happens within { } braces - Multiple text variations can be defined at once: "I {like|love|prefer} candy" - If a slot name is at the end, it is retained in the utterance: "The color is {red|green|blue|COLOR}" => "The color is {red|COLOR}" (...) - If a dictionary key is used in an expansion, it will be treated as if its values were hard-coded: "color {colors}" == "color {red|green|blue}" - Hard-coded values may be mixed with dictionary keys: "{lime|magenta|colors}" - Number ranges can be used: "I am {1-100} years old" - Number ranges can have a step count: "I'll buy {5-100 by 5} items" - Words can be made optional with an empty piped value: "I{ really|} like candy" - Nested expansions are not supported - More auto-expanding features similar to number ranges may be added
alexa skills kitcommunity projects
10 |5000

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

jjaquinta avatar image
jjaquinta answered
I like the idea of it being a superset of the intent schema. Have you checked that you can paste this in and it will be recognized as-is? That would be a win since it would avoid needing a filter to create the intent schema. The functionality is fine, but I think you need to use something other than {} for expansion. As it currently is there is ambiguity. For example the line: [code]"I call myself {names|name}"[/code] You cannot tell explicitly if names is a dictionary entry or is a hard-coded example for the utterance. Similarly you can't tell the difference between an ad-hoc list, and an entry that points to a dictionary list. Say you used [] for expansions, and ^ to indicate a dictionary lookup. You could even then mix them. Then you would have: [code]I call myself {Arnold|name}[/code] [code]I call myself {[Arnold|Betty|Charles]|name}[/code] [code]I call myself {[^Names]|name}[/code] [code]I call myself {[^FirstNames|^LastNames|Hubert|Anastasia]|name}[/code] Now there is no ambiguity. It would also be nice to be able to include one dictionary in another dictionary.
10 |5000

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Matt Kruse avatar image
Matt Kruse answered
Thanks for voicing your opinions :) > I like the idea of it being a superset of the intent > schema. Have you checked that you can paste this in > and it will be recognized as-is? No, it definitely will not. It would need to be converted by some app or build process. Ideally, Amazon would adopt the meta-data format and accept it natively. > The functionality is fine, but I think you need to > use something other than {} for expansion. There is a long history of using {} for expansion, back to at least unix/bash. And I wanted to match the existing utterance format. So a current utterance example for multiple options is: [code] My name is {Matt|name} My name is {Bob|name} [/code] It makes the most sense to me to keep the format, but combine them: [code] My name is {Matt|Bob|name} [/code] So that's where I started. > "I call myself {names|name}" > You cannot tell explicitly if names is a dictionary > entry or is a hard-coded example for the utterance. True. But in reality, it's pretty clear to the developer, I think. And as long as dictionary entries are resolved first, then it's clear. > Similarly you can't tell the difference between an > ad-hoc list, and an entry that points to a dictionary > list. Again, true, but you can also combine them, with simpler syntax: [code] {Matt|Bob|names} [/code] > Say you used [] for expansions, and ^ to > indicate a dictionary lookup. I don't necessarily like the [] for expansions, but adding support for ^ to indicate dictionary lookup is something I think might be a good idea. So, this: [code] I call myself {^Names|name} [/code] > It would also be nice to be able to include one > dictionary in another dictionary. Oh no, dictionary inheritance to generate utterances from a meta-schema! Has this gotten overly complicated? ;)
10 |5000

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

jjaquinta avatar image
jjaquinta answered
Well, I think if it is not a 100% superset of the existing intents file format and supports ambiguity, then it is a non-starter. I don't think Amazon is ever going to adopt something that isn't directly compatible with what they have (especially since it could easily do so), or something that is open to multiple ways of interpreting things. That's just not how you design architecture. I'd happily adopt and update my samples and the chapter on generating utterances in my book it if it met those criteria. I think there are elements that are better than the XML structure I have already. But in my opinion, those two flaws are insurmountable. Good luck!
10 |5000

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Matt Kruse avatar image
Matt Kruse answered
I don't think there is any superset format that Amazon would accept. They are very strict about their format. There is no way they would accept something like this, because it doesn't match their structure at all. Nor would they do the expanding of the utterances. The point of a meta-data structure is to be able to generate what Amazon needs, in a more reasonable way.
10 |5000

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.