question

Matt Farley avatar image
Matt Farley asked

Workaround for Push Notifications!

Like everyone else, I've been waiting for true Push Notifications since the Echo was released. This weekend I was able to put together a *very* quality workaround thanks to the work of:

John Graves @ https://github.com/gravesjohnr/AlexaNotificationCurl

and

Miguel Mota @ https://miguelmota.com/blog/alexa-voice-service-with-curl/

I have built upon their examples with a major improvement that results in higher quality Text-to-Speech (response from the Alexa Voice Services).

YouTube Demo: https://www.youtube.com/watch?v=y09_YaduvEk

The YouTube demo consists of 4 example push notifications:

  1. When I receive an SMS text, it is read by Alexa
  2. When I simulate physically "coming home" by activating my cell phone wifi and it logs onto the network, Alexa greets me
  3. The push of my kids Amazon DASH buttons to launch their favorite cartoons on the TV. (I'm not manually controlling the TV, it's all done via scripted/automated IR Blasters on a Ubuntu HTPC)
  4. Automatic.com cellular dongle in our minivan lets us know my wife just pulled onto our street

In the demo you will actually see several notification systems tied together:

  1. Desktop Notifications sent by my Linux server to all TV's
  2. Visual light flash automations (sent by my server to our LIFX WiFi light bulbs)
  3. The audio Push Notification spoken by Alexa

How the Alexa Push Notifications work: (90% of the work was done by John and Miguel cited above)

  1. Register an application for Alexa Voice Service (follow Miguel's instructions linked above)
  2. Use John Graves code as a starting point to interact with AVS via command-line
  3. Custom code (running on my home server) receives notifications from a number of sources (Android notifications are sent from our devices to the server using a home grown app), the WiFi connection and DASH button pushes are detected by my DHCP server, which initiates the notifications in response. Bottom line -- some code or something to initiate a notification. Could be anything.
  4. Write the text of the notification to a file on the server, e.g. /tmp/AlexasAnnouncement.txt
  5. Send a canned boiler plate pre-recorded .wav to the AVS, in my case it's a recording of me saying "tell <skill/app> push notification"
  6. My custom Alexa skill receives the words "push notification", then
  7. Reads and responds with the text from step 4
  8. The command in step 5 receives the .mp3 response from AVS (which is the text that was put in AlexasAnnouncement.txt)
  9. That mp3 is then played on the Echo over bluetooth

Shorthand:

  • "Some text to be spoken" > saved in .txt
  • Send pre-recorded .wav to AVS: "tell <skill/app> push notification" > Alexa Skill
  • Alexa Skill responds to 'push notification' by sending contents of .txt to AVS
  • AVS sends response as > .mp3 data stream of Alexa speaking the text with enunciation
  • .mp3 played > bluetooth
  • bluetooth > Echo speaker

Everything above is completely scripted/automated. So when we receive notifications from our phones, desktops, DASH buttons, etc, etc, or notified that someone's at our front door (via security camera), the server in the house writes some text in a file that gets spoken out loud by our trusty Echo (in addition to flashing lights and popups on the PC's and TV's).

Question: Why not just send the text directly to AVS using pico2wav (STT) and play the response like John Graves does in his original code? i.e. why are you creating an Alexa Skill to be notified of a notification and read it from a text file? (which adds several loops to the workflow, versus John's original)

Answer: I originally started just using John's pico2wav to convert my notification to a computer-spoken .wav file for AVS to translate (e.g. "simon says <your notification text here>). However, I found that AVS often misunderstood the pico2wav files. For example, if my notification was something like "Where are you?", the quality of the pico2wav may result in AVS hearing "We are who?" You also lose the punctuation and intonations that Alexa is capable of when you are sending her text in a skill (as opposed to a computer-generated .wav speech-to-text)

I've had this going for a few days now and am very surprised how well it works. And I can finally quit worrying about when Amazon will give us native push notifications.

Now... if they'll just give us hardware GUIDs in our skill interactions then I won't have to use a different Amazon account in each room of the house (so my custom skill knows which lights and TV to turn on and off without forcing the user to specify).

If you want to see more of my Alexa home automation, check out this demo of Jarvis. It's about a year old and the functionality has increased since the video was taken: https://www.youtube.com/watch?v=9f5XTynrbAA

alexa skills kithow-to
4 comments
10 |3000 characters needed characters left characters exceeded

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

I suppose you could upload the /tmp/AlexasAnnouncement.txt to an S3 bucket and then have an Amazon hosted lambda read it and return it?

This would give someone (me!) the same result without a locally hosted 'Jarvis' correct?

1 Like 1 ·

Yes, that would definitely work.

0 Likes 0 ·

Hi Matt, your changes to John Graves' code sound very interesting. Is there anyway you can fork his repo and put your code up on Github?

0 Likes 0 ·

Great post Matt. Will try it today.

What did you use to display notifications from linux on TV?

Thanks,

Uros

0 Likes 0 ·
Matt Farley avatar image
Matt Farley answered

I actually didn't change John's code much. Whereas the original generates a new multipart_body.txt for each request (which contains the text-2-speech), I use a static multipart_body.txt that contains a recording of me saying "push notifications".

The rest of the work is on my local server and locally hosted Alexa Skill (Jarvis). Which receives the "push notifications" alert from John's code, and then responds accordingly as described above.

If you wanted to replicate this you would need to create your own Alexa Skill on a server/platform/language of your choice, and have it respond to your prompt however you see fit (ultimately sending the audio.mp3 response from AVS to the physical Echo over bluetooth).

So in terms of John's code, I use the original tools to get/set the tokens, and then I cut down his alexa.sh to the last two commands:

# Auth token (replace with yours). 
TOKEN=`cat token.dat` 

# Boundary name, must be unique so it does not conflict with any data. 
BOUNDARY="BOUNDARY1234" # Compose cURL command and write to output file. echo "Making request..." curl -s -X POST \ -H "Authorization: Bearer ${TOKEN}" \ -H "Content-Type: multipart/form-data; boundary=${BOUNDARY}" \ --data-binary @multipart_body.txt \ https://access-alexa-na.amazon.com/v1/avs/speechrecognizer/recognize \ | perl -pe 'BEGIN{undef $/;} s/--.*Content-Type: audio\/mpeg.*(ID3.*)--.*--/$1/smg' \ | tee response.mp3 | play -t mp3 -q -
1 comment
10 |3000 characters needed characters left characters exceeded

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Nice! Great explanation! I'm thinking I'm going to run this whole thing on PI and 3D print a stand similar to this one to hold it so I can just integrate the whole thing locally with the echo :)

1 Like 1 ·
Matt Farley avatar image
Matt Farley answered

Here's a useful script I use to make sure the Pi stays connected via Bluetooth to the Echo. (sometimes the connection drops, or when the Pi or Echo reboot, it doesn't automatically reestablish)

SPEAKER_MAC = Echo's Bluetooth MAC

SPEAKER_NAME = the name the bluetooth adapter has assigned to the Echo

(works with the latest version of Raspbian Pi, Jessie)

#!/bin/bash

SPEAKER_MAC=AA:BB:CC:DD:EE:FF
SPEAKER_NAME=bluez_sink.AA_BB_CC_DD_EE_FF

until pacmd list-sinks | grep bluez_sink > /dev/null
do
	pulseaudio -k
	pkill pulseaudio
	sleep 1
	
	pulseaudio --start &
	sleep 2
	
	echo -e "power on\nconnect $SPEAKER_MAC\n" | bluetoothctl
	sleep 4
done

pacmd set-default-sink $SPEAKER_NAME

2 comments
10 |3000 characters needed characters left characters exceeded

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

I'm also trying to do this with my pi. How do you get the echo to connect as a bluetooth sink? It seems to alwasy connect to my pi as a source and I can't figure out how to change that. It just keeps putting audio out to me pi, but I can't output my pi's audio to the echo dot. Any suggestions?

0 Likes 0 ·

It used to work for me (it would connect as a sink) but lately it's connected as a source and I've been unable to fix that. I'd love for someone else to comment if they understand the arcane bluetooth/asla/pulseaudio stuff!

0 Likes 0 ·
Gilles van den Hoven avatar image
Gilles van den Hoven answered

Ok after carefully re-reading your post here, the post here on the forum and reading through the code of John and Miguel i think i have figured it out. Unfortunately English is not my native language so i had to read it twice. I have also never developed an Alexa skill and have to go by my limited knowledge and tutorials i've seen online.

The first step is that you recorded a WAV file with the content "tell <your-custom-developed-skill> push notification" in the correct format. You send this wave file in step 5 to AVS. Your custom Alexa skill (which is a lambda function?) does a REST call to your home server and that in return responds to AWS lambda with the contents of the txt file. (Hence the remark of Ian that he could also host the txt file somewhere else). AVS synthesises this into a MP3 which you play over bluetooth and presto, the loop is round.

Can you elaborate on the remark you made on the Amazon forum; ... then I won't have to use a different Amazon account in each room of the house ...

Would that mean that you have to register your custom skill for every account?

10 |3000 characters needed characters left characters exceeded

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Matt Farley avatar image
Matt Farley answered

Gilles - Very close. I don't have a lambda skill, my Alexa skill is actually hosted on my home server (in PHP). So the Alexa skill on my home server reads the txt file directly. Otherwise, you're correct.

The statement about "a different Amazon account in each room" is pertaining to the fact that Amazon gives us no way to uniquely identify which Echo device is being spoken to. So I have two options:

1) Whenever I give a command, say out loud which room I'm referring to: "Alexa, tell Jarvis to turn off the lights **in the bedroom**"

or

2) Put the Echo in my bedroom on a different account, and in the URL to the Alexa skill have a querystring setting that specifies the room. e.g. http://myipaddress/skill/jarvis/?room=bedroom

I've actually implemented both options. So if you don't mention the room out loud, it detects the URL in the querystring Amazon uses to call the skill (set in the account settings).

---

FYI, hot off the presses, there's a newer alternate option: Amazon Polly. An AWS service that provides an .mp3 of spoken language when you give it text. The quality of the voice isn't quite as good as Alexa, but you have a few dozen accents/languages to choose from, and you don't need to host your own skill. On the other hand, they start charging you (pennies) after the first year:

https://aws.amazon.com/polly/

https://aws.amazon.com/blogs/aws/polly-text-to-speech-in-47-voices-and-24-languages/

https://alestic.com/2016/11/amazon-polly-text-to-speech/

Once you get your .mp3 from Polly, you'd play it on the Echo via bluetooth (or any other speaker, I have them coming out of our livingroom HTPC and master bedroom HTPC)

1 comment
10 |3000 characters needed characters left characters exceeded

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Ohh nice, i am also not a fan of Lambda (because that also adds delays, no matter how low their latency is). Do you use a specific package for creating your php skill or did you create it from scratch?

And does Alexa 'force' you to use "Alexa, ask .." phrases? As i read on their custom skills page i should also be able to do "Alexa, dinner is ready" which would be handed to my custom skill (e.g. to turn off the tv and turn on the kitchen dining lights, right?)

0 Likes 0 ·
cdburns123 avatar image
cdburns123 answered

Could you not use Microsoft speech engine; text to speech and bluetooth to the echo dots?

10 |3000 characters needed characters left characters exceeded

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

heffneil avatar image
heffneil answered

Question: Does anyone have this running on Mac OSX? I was able to get everything installed on my mac but pico2wav I don't know which package in brew I can use to get the tool?

1 comment
10 |3000 characters needed characters left characters exceeded

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

You can use the builtin say command. Replace this line:-

pico2wave -w /tmp/pipe.wav "${QUESTION}" | tee pico2wav.wav | sox - -c 1 -r 16000 -e signed -b 16 -t wav - >> multipart_body.txt

With these (say cannot be piped into another command so need interim file):-

say "${QUESTION}" --output-file=/tmp/say.wav --data-format=LEF32@22050

cat /tmp/say.wav | tee pico2wav.wav | sox - -c 1 -r 16000 -e signed -b 16 -t wav - >> multipart_body.txt

and it should work fine.

0 Likes 0 ·
Topsail avatar image
Topsail answered

This is the kind of thing my wife and I have been looking for! The only problem, every post in here may as well be in a foreign language! Neither of us code or have any idea what we are looking at. Is there a skill we activate and type this into or something we we can simply activate and tell it to do it? How did you get it to display on your TV? Thank you!

1 comment
10 |3000 characters needed characters left characters exceeded

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Hi topsail.

This is a developer forum for coders who wish to develop Alexa skills. This thread in particular is at the cutting edge of development (making Alexa do things she's not designed to do).

I would respectfully suggest that, if you have no coding ability, this might not be the best place to start.

I'm not aware of any published skill that does this. I'm not even sure Amazon would certify such a skill (I could be wrong). If you find a solution, post back.

0 Likes 0 ·
peternann avatar image
peternann answered

Hang on - Am I reading this right that the approach is basically to 'trick' the Alexa AVS service (server side) into thinking Alexa has heard the wake-word and a request? And then the skill takes over?

Isn't this an almost worst-case scenario that a motivated hacker could make your Alexa START LISTENING at any time they wanted? (Assuming they had access to your Alexa login/device to activate the skill)

The only defence being that you might notice the blue lights.

Am I right that this is how this works? If so, I am a bit surprised the device security isn't better... If the physical Alexa device hasn't heard the wake-word I assumed she wouldn't respond to any sorts of Skill instructions.

I would not be surprised to see such loopholes closed down at any moment.

Other than that, this is a very neat solution! Nice work!

Alexa skills are crying out for a Permissions architecture in my opinion... Eg. Instead of deprecating the open vocabulary slot, they should just tell users and let them decide if they accept the (minor) risk, or not. Ditto for notifications. ... Please Amazon ...

1 comment
10 |3000 characters needed characters left characters exceeded

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Hi Peter -- luckily, no, "it's not what you think" :) ... It is not tricking the Alexa into thinking it's heard a wake word. The Alexa is only being used as a 'dumb' bluetooth speaker.

0 Likes 0 ·
Josu avatar image
Josu answered

Is it possible to have alexa ask you a question and expect a response this way instead of just informing you a statement? I don't see how, maybe sending the response to the amazon echo instead of back to the bash console.

10 |3000 characters needed characters left characters exceeded

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

finalight avatar image
finalight answered

I still can't make this work even when i'm using ubuntu on virtualbox

ALSA lib confmisc.c:768:(parse_card) cannot find card '0' ALSA lib conf.c:4292:(_snd_config_evaluate) function snd_func_card_driver returned error: No such file or directory ALSA lib confmisc.c:392:(snd_func_concat) error evaluating strings ALSA lib conf.c:4292:(_snd_config_evaluate) function snd_func_concat returned error: No such file or directory ALSA lib confmisc.c:1251:(snd_func_refer) error evaluating name ALSA lib conf.c:4292:(_snd_config_evaluate) function snd_func_refer returned error: No such file or directory ALSA lib conf.c:4771:(snd_config_expand) Evaluate error: No such file or directory

10 |3000 characters needed characters left characters exceeded

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.