So, one of the feedback items I got from Amazon was that I should use AMAZON.US_FIRST_NAME for a slot that took a name. The thing is, this slot also needs to recognize certain pronouns. I started going through a test pass last night - this is something that takes hours to do because Amazon still provides no way to do automated testing of our utterances - and with this suggestion things were going really well. When I would use a standard English name, it would pick up that name, when I would use a pronoun like "me", it would almost always pick that up as well (with the occasional "mia" mixed in). I paused halfway through, and picked it up again this evening, only to find that it is now mapping to names like "mi" and "mei" every single time. I never once triggered either of these names last night, and now I am unable to avoid triggering them every time. During the intervening period, I have not made a single change to the service, lambda passthrough, utterances, intents, or any other skill configuration. It has been entirely untouched. This implies that there has been a change to AMAZON.US_FIRST_NAME, which is frustrating for two reasons: 1. It's now unreasonably returning names that are not common US first names, which happen to conflict with common pronouns that would be used in place of names. 2. More disturbingly, [i]they appear to have changed this enum behind the scenes without notifying anyone[/i]. How are we expected to build and test these skills when they are providing a moving target? If they wanted to add more names, that's fine, but they should have done it by versioning their enum, not through this bait-n-switch approach. Between these types of shenanigans, the inability to do any sort of meaningful testing, and the seemingly arbitrary and capricious approach to their certification process, it's really starting to feel like this is not a platform worth building for...
Amazon are constantly tweaking the system. That's, pretty much, a given. (Optimizing that pipeline was actually the job they offered me last summer!) Certainly every time they add an innate feature, they are going to tweak things. For names, well, it's a new custom slot. (Did they even document it? I couldn't find it.) There's going to be some thrash. So, did the cert people [i]require[/i] you to use it or [i]suggest[/i] that you use it? If it's a suggestion, then don't. If it's a requirement, then don't, resubmit, and hope the next certifier doesn't require it. A few years ago I did a major walk through and dump of public info on Facebook and tabulated most common names cross indexed by language. (This was for the NameMe project http://www.ocean-of-storms.com/nameme/) If you want the most common n-hundred names (first or last) in just about any language, drop me a line at
email@example.com. You can use those as the basis of your own slot. If you are in control, then you control (most of) the stability.
It is documented, after one of their employees commented about it on a thread a couple weeks ago I scoured the docs and found a mention of it here:
https://developer.amazon.com/public/solutions/alexa/alexa-skills-kit/docs/alexa-skills-kit-interaction-model-reference . It definitely wasn't one of the initial built in slot types, though. Anyhow, one of their required changes was about one-shot invocations, and that was part of how they said to improve the probability of one-shots triggering. The thing is, they were actually right on that point. I've been implementing their suggestions piecemeal and testing relative to a baseline (which, as I mentioned, is exceedingly time consuming), but this slot type legitimately enhanced the match rate. At least, it did last night. Now it's significantly degraded with these new changes. I disagree that they should be free to change things like this - at least not in conjunction with the other problems mentioned. If they would let us send them audio clips to do automated testing, then they could change things til the cows come home and we could easily confirm that our code wasn't broken. Alternately, if they didn't have ridiculous requirements for certification we could just tell our users to always launch first, and never have any problems with one shots. But, since they don't let us test, and they do have these requirements, it feels like a fair tradeoff that they not move the goalposts on us.
Just an idle thought, but I wonder if there is any NLP training or machine learning going on via developer skills and testing - your experience implies some type of persistence taking place and it would be interesting to know what is behind this. This could be a good way to for Amazon/Echo to get a lot of name info for their name slot, but also dangerous (and unconventional) in the event that buggy or malicious utterances were fed as input.
That's an interesting thought, Rand. I bet there is some of that going on in general, although in this specific case they actually said that the list comes from census data. I did have a brief moment, though, where I thought that maybe the testing I had done the night before had triggered some sort of alarm they had on what is essentially the "cache miss rate" of their enum, which then caused someone to go change the list. In retrospect, though, I don't expect that my throughput would be significant enough to get anyone's attention.
> although in this > specific case they actually said that the list comes > from census data. That explains the poor results. Unless they put some bias in there, it will equally recognize obscure names from today and common names from 1880. Given how similar names can be, It's a recipe for disaster. That's why I'd advocate the most-common-on-Facebook approach. Or anything similar that biases the recognition towards names that are [i]currently[/i] in frequent use.
The interesting thing, Jo, is that this changed overnight. On Monday it was never returning this low-frequency names at all. On Tuesday, it was returning them all the time. I'm guessing that all they did was (significantly) lower the threshold of frequency, but it's still annoying that they did it. I'm gonna try hardcoding the pronoun intents to see if that makes it better, but if that fails we'll likely be interested in the list of names you have.
Was really hoping for feedback from an Amazon employee on this post... maybe from Ross@Amazon who keeps posting about the first name slot type in a different thread... For the sake of my sanity, I need confirmation that the set of names was modified between Monday evening and Tuesday evening. If the giant list with super obscure names is going to be the list they use, then we obviously need to change our approach to not use that type.
The Census bureau used to only publish the top 100 names from each year. Then, a few years ago, they started publishing all occurrences down to 5 instances for each year. (An ex-roommate runs
namenerds.com, so I know a lot more about this than normal people!) I suspect the first draft of the custom slot used the smaller database. Then the magical change was when they uploaded the large database with ALL the names. It really is quite big. I have it kicking around somewhere. I did some software for my namenerd friend and ended up having to put it into a relational database in order to manage it.
The SDK team is constantly working to improve the platform but they have informed me that the US_FIRST_NAMES slot hasn't changed since November and doesn't include pronouns. It does however include Mia because it is a perfectly valid name. Galactoise, you said you were going to try hardcoding the pronoun intents to see if that would improve recognition. Did that end up working for you?