E. McDonald avatar image
E. McDonald asked

204 responses

This is all just experimenting so I could be off base but it seems like it is reproducible. I've noticed 204 responses and I think I may have figured out when they are returned for a common reason. I believe what happens is that the audio rate of the input recording doesn't match what you specify in the metadata and/or the content type. The example is hard coded for a 16k recording and if you are just grabbing a recording from a random recording it is probably recorded at 44k. My guess is that the input is distorted so much from the rate difference that most of the time the AVS system thinks it is just silence and decides you have uploaded a blank recording, thus the 204 no-content response. I've also noticed that no matter what you specify in the metadata or content type only 16k recordings seem to function, every other rate seems to result in a 204. It would be nice if there was some sanity checking in place for this, if you specify that the recording is 16k and it isn't then that should probably be an error. It could probably just test to make sure it is within some threshold. Second, it seems like the input has to be 16k no matter what is specified?
alexa voice service
10 |5000

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

1 Answer

swasey@amazon avatar image
swasey@amazon answered
Hello. Yes, the data is required to be LPCM 16bit, with a 16khz sample rate. With the version of the code that is currently up on the developer portal you may experience a NullPointerException when a 204 is received. The dev team currently has a fix in flight for that, but I can't comment on when it'll be available to customers. I'll also create a bug about the invalid audio returning a 204, when it should actually be returning a 400 BadRequestException as described in: Thanks for the feedback.
10 |5000

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.