question

thehocoproject avatar image
thehocoproject asked

AVS - getting partial audio on 'structured' responses [iOS Swift]

The iOS AVS App client does not seem to receive the full multipart audio for ‘complex’ questions such as enumeration of To do or groceries list items. Context - I am implementing an AVS compatible device on the iPhone using the latest libraries on Xcode/Swift - I am not using the Login With Amazon (LWA) SDK as (1) I want most code in Swift, and (2) the downloaded SDK even compiling. I am, however, using the LoginWithAmazon.framework which has the AIMobile.h class - Authentication works, both for obtaining authCode, access_token and refresh_token. I know this is the most complex and least direct way to do it, but the short way of getting an access_token directly does not work - Simple Alexa feedback works fine: weather, time, etc. Where the issue appears... - When I post a recording asking a question such as ‘WHAT IS ON MY TO DO LIST?’ having previously recorded three items, I get an audio response saying: “You have three items”. But that’s all, and there is no enumeration of each item after that - In theory the data returned should contain a multipart file with distinct headers for "audio/mpeg” sections, including boundary flags marking the end of each part (to enable parsing). I think my parsing works well; the file I get however, does not contain enough sections/parts. - Below is an example of the response I get in the debugger HTTP Response Status code: 200 HTTP Response ALL Header Fields: [Connection: keep-alive, Content-Type: multipart/related; boundary=46769548-e06e-4bf7-8afc-b76852374ab6; start=metadata.1453599592745; type="application/json", Server: Server, x-amzn-RequestId: 5b346ec6-c23b-11e5-a1b8-13169954681a, Vary: Accept-Encoding,User-Agent, Date: Sun, 24 Jan 2016 01:39:52 GMT, Content-Encoding: gzip, Transfer-Encoding: Identity] sparing you the HEX code… Boundary: <34363736 39353438 2d653036 652d3462 66372d38 6166632d 62373638 35323337 34616236> Inner boundary format (the way I identify the sections): "\(boundary)\r\n” Inner boundary format (end of message): "\r\n\(boundary)--\r\n" Inner Ranges: <5b5d> Header range: (30,4) >> range positions looking for "\r\n\r\n" Header range: (106,4) Contents headers of multipart message: Part 1: ["Content-Type": "application/json”] Part 2: ["Content-ID": " ", "Content-Type": "audio/mpeg”] >>> This is the “You have three items in your TO DO list” audio response. I should be getting more of those “audio/mpeg”. So few options here: - Either I am getting the full multipart but identifying the boundaries incorrectly, or - I should post a request for more audio files, in the same way as if I were using the audio player? Thank you in advance for your help!!
alexa voice service
10 |5000

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Blixt avatar image
Blixt answered
I think that's a bug with AVS. I'm doing the requests completely differently (via HTTP requests from a backend) and am seeing the same behavior. I get multiple sounds for "what's on my calendar?" or "order " as individual multiparts, but "what's on my todo list?" only gives me one audio file stating the count of todo items.
10 |5000

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Blixt avatar image
Blixt answered
Actually, I just checked this and it appears as if the todo item request actually comes back as an AudioPlayer response as opposed to a SpeechSynthesizer response (what you get when asking for the weather). This means you may have to call getNextItem with the navigationToken to get the following audio. See this API document: https://developer.amazon.com/public/solutions/alexa/alexa-voice-service/rest/audioplayer-getnextitem-request
10 |5000

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

thehocoproject avatar image
thehocoproject answered
Thank you Blixt, I'll give that a try :)
10 |5000

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

thehocoproject avatar image
thehocoproject answered
Hey Blixt. I tried the AudioPlayer suggestion you made but AVS returns: "Unrecognized token 'navigationToken'". I tried different options incl. nil, false, access_token but none worked. I'll contact support with my request ID so they can check it out. I'll post the response here later.
10 |5000

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Blixt avatar image
Blixt answered
Hey, I also tried this and it seems to work and gives me another stream. I haven't fully integrated it but I do get audio data back when providing a navigation token. See below for the request format I used. It's important to note that unlike the recognize request, this one should not be multipart or form encoded, it should just be a POST with plain JSON in the body. POST /v1/avs/audioplayer/getNextItem Content-Type: application/json ... other headers ... { "messageHeader": {}, "messageBody": {"navigationToken": " "} }
10 |5000

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

elstryan@Amazon avatar image
elstryan@Amazon answered
Hi thehocoproject, Blixt is correct: some of the features that you refer to in your above post will come back as AudioPlayer responses which means that you'll next to call getNextItem with the navigationToken. Blixt has a good example in his above post of what POSTing this event should look like. thehocoproject: can you confirm that you are using a message format similiar to Blixt? Can you also confirm that you are using the navigation token in the play directive.
10 |5000

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

thehocoproject avatar image
thehocoproject answered
Hi - I changed the messages to be exactly like those suggested by Blixt. However it seems I need the streamID and/or the navigationToken. I am not clear on where these can be found/generated? For example when I ask 'What is on my to do list?', should I expect to get a navigationToken (which I assume differs form the accessToken and the refreshToken)? And separately, could you elaborate on what the streamID is. I can see it in the documentation but it is unclear whether it should be created by me (like the authentication challenge) or returned by AMZN. Thanks!
10 |5000

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Blixt avatar image
Blixt answered
The navigationToken will be in the response to your initial request (it contains a JSON file and an audio file, the navigationToken is in the JSON file). Here's an example response (sorry if formatting is off, this forum doesn't do it too well). Note how it contains a navigationToken in the data. { messageHeader: {}, messageBody: { directives: [ { namespace: 'AudioPlayer', name: 'play', payload: { audioItem: { streams: [ { streamUrl: 'cid:ListIntroductionEmptyPrompt0de0558b-xxxx-xxxx-xxxx-5c4c74e08d15', offsetInMilliseconds: 0, expiryTime: null, progressReport: { progressReportDelayInMilliseconds: 0, progressReportIntervalInMilliseconds: 0 }, progressReportRequired: false, streamId: 'amzn1.as-ct.v1.Domain:Application:Todo:Browse#ACRI#ssml#ACRI#ListIntroductionEmptyPrompt0de0558b-xxxx-xxxx-xxxx-5c4c74e08d15' } ], audioItemId: 'amzn1.as-ct.v1.Domain:Application:Todo:Browse#ACRI#ssml#ACRI#ListIntroductionEmptyPrompt0de0558b-xxxx-xxxx-xxxx-5c4c74e08d15' }, navigationToken: 'amzn1.as-ct.v1.Domain:Application:Todo:Browse#ACRI#ssml#ACRI#ListIntroductionEmptyPrompt0de0558b-xxxx-xxxx-xxxx-5c4c74e08d15', playBehavior: 'REPLACE_PREVIOUS' } } ] } }
10 |5000

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

elstryan@Amazon avatar image
elstryan@Amazon answered
@thehocoproject: Both the streamId and navigationToken are returned by the Alexa Voice Service and do not need to be generated by the client implementation. The streamId should be added to the playback state json object ( https://developer.amazon.com/public/solutions/alexa/alexa-voice-service/rest/audioplayer-events-requests#playback-state-json-object) which is included in many audio player events that the AVS client sends during audio playback (such as playbackFinished or playbackStarted) The navigation token on the other hand is added to the getNextItem request ( https://developer.amazon.com/public/solutions/alexa/alexa-voice-service/rest/audioplayer-getnextitem-request). Blixt has a really good example response above which shows the first play directive for a list. Please note how it has both the navigationToken and streamId.
10 |5000

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.