I noticed that there is a long delay between a response/after reprompt until echo starts listening to an input (the light blue led changes to listening mode). This causes a delay which is unusual, and users will have to wait quite a while until they can speak. In general, users will want to talk to alexa immediately after hearing a response. In this case, they have to wait about 5 seconds until alexa will start listening.
I Am using alexa skill template speechlet output, or sometimes ssml audio output, along with a text.
Is this a known limitation or I Am doing something wrong?
example: '<speak><audio src="test.mp3" /> Do you want to listen to another Joke?</speak>'
1.Alexa plays the response.
2.waiting..... about 5 seconds (?)
3.Alexa starts listening for inputs.