question

alexasboy avatar image
alexasboy asked

Playing sound clip in Python

HELP! How do I play sound clips when using Python rather then Java??? Appreciate the assistance!
alexa skills kit
10 |5000

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Galactoise avatar image
Galactoise answered
It shouldn't matter what language you are using, you just have to make sure that your response speech is ssml and contains an audio tag pointing to your clip. That said, there are a ton of requirements about the manner in which the audio is encoded, so I'd recommend starting with a known-good audio sample for your testing.
10 |5000

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

jjaquinta avatar image
jjaquinta answered
To Galactoise's end, TsaTsaTzu has uploaded a bunch of sound clips in the correct format for public use here: https://s3.amazonaws.com/tsatsatzu-alexa/index.htm
10 |5000

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

alexasboy avatar image
alexasboy answered
I really appreciate the quick feedback. As a somewhat new developer, I am a little confused as to how to format the response. I have included this piece of code: "outputSpeech": { "type": "SSML", "ssml": " This output speech uses SSML. " } What I do not understand is how to format the audio tag portion. The sample I see is below but I am not sure how to use that piece of code without getting syntax errors. Welcome to Car-Fu. You can order a ride, or request a fare estimate. Which will it be? Apologies if this is a silly 101 question!
10 |5000

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

alexasboy avatar image
alexasboy answered
I figured it out. Thanks again for your assistance!
10 |5000

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Paul Reiche III avatar image
Paul Reiche III answered
Would you mind including the final formatted statement?
10 |5000

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

alexasboy avatar image
alexasboy answered
def build_speechlet_response(title, output, reprompt_text, should_end_session): return { 'outputSpeech': { "type": 'SSML', "ssml": ' here goes the clip ' }, 'card': { 'type': 'Simple', 'title': 'SessionSpeechlet - ' + title, 'content': 'SessionSpeechlet - ' + output }, 'reprompt': { 'outputSpeech': { 'type': 'PlainText', 'text': reprompt_text } }, 'shouldEndSession': should_end_session }
10 |5000

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Kwan avatar image
Kwan answered
Hi jjaquinta, if you are using ffmpeg to do the encoding, would you tell me the command line parameters that you used? I am a bit surprised the effort I had to go and still not able to get the SSML required format done correctly. I have tried many different mp3 codecs (including ffmpeg) coding the audio files at 48kbps, but none worked with Alexa -- except your TzaTzaTzu files! If you are using another MP3 encoder, would you also share that information with me. Thanks for your help.
10 |5000

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

jjaquinta avatar image
jjaquinta answered
I'm not at my computer right now. But, yes, I did use ffmpeg. Do a Google search for something like Amazon Alexa ssml ffmpeg. There is a page in the Amazon docs with the settings to use. Also in this forum. They have worked for me. If you don't find it by tonight I'll look up the exact settings I use and post them.
10 |5000

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Kwan avatar image
Kwan answered
Yes, I did find something from stackoverflow: http://stackoverflow.com/questions/34583224/what-is-the-right-command-to-convert-an-mp3-file-to-the-required-codec-version. This is the command I used where, for me, input.wav was a 48Khz PCM wave file. ffmpeg -i input.wav -b:a 48k -ar 16000 output.mp3
10 |5000

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

jjaquinta avatar image
jjaquinta answered
I think this is the canonical command line: [code]ffmpeg -i -ac 2 -codec:a libmp3lame -b:a 48k -ar 16000 [/code] At least that is what I have in my code. I had to 1000+ sound files for Acoustic Chord, which just got submitted. Each one started as six files, one for each string on a guitar. Each had to be pitch adjusted depending on which fret the figure was on in the tabulature, then timed, blended, and mixed together to get the sound of the chord being plucked and then strummed. All told, rather a lot of ffmpeg manipulation. :-) But the final pass of processing was using a command, like the above, to convert the final mix to an Alexa compliant mp3 file. Hopefully you can heard the final result in a few days if it passes certification. This was on Windows, but I think it will be the same on all platforms. You do need to install the LAME MP3 support, and how you do that differs on each platform.
10 |5000

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.