question

Chris avatar image
Chris asked ·

Get HTTP from External Website

I am writing a Custom Skill in Node.js. I want the user to speak a request and then my code should find the necessary information to send back from an external website. However, the website does not use an API; the content I need is within the HTML. I will then extract the content I need from the HTML code. Would https.get or https.require work for this task?
alexa skills kitdebugging
10 |2000 characters needed characters left characters exceeded

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

jjaquinta avatar image
jjaquinta answered ·
This is called 'web scraping' and is, in general, a poor way to do things. It usually breaks the terms and services of a web site, doesn't work with many modern, dynamic websites, and is fragile. If you are just jury rigging something for your own interest, sure, go ahead. If you are trying to produce a professional skill that others use, I would advise you to rethink your approach. In any event, that's a programming question not a ASK specific question. You can google the internet for plenty of ways to do web scraping with node.js.
10 |2000 characters needed characters left characters exceeded

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Chris avatar image
Chris answered ·
Thanks for your response. Are APIs the only way to interact with JSONs from external websites? How can I tell if a website accepts JSON requests?
10 |2000 characters needed characters left characters exceeded

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

jjaquinta avatar image
jjaquinta answered ·
A [b]website[/b] presents a visual interface (usually in HTML) for a human to consume via a browser. A [b]web service[/b] presents a programmatic interface (usually in JSON) for a computer to consume via (usually) a REST call. If a company wishes to makes its data available for people to programmatically access, they will provide a web service for doing so. This will probably be documented on the site, along with the terms and conditions that it is made available under. If you want to produce something officially, or you don't want to risk getting a lawsuit, this is the avenue to take. Web scraping is a technique to programmatically browse a website, extract information from the HTML, and return the data. So, it is a way to wrap a web service around a web site. You can also spy on the web site. If it has a dynamic Ajax-like interface, it is probably making web service calls under the covers. You can try to derive the API for those and then call that. This, however, is unofficial and may raise the ire of the web site in question. For example, various people have backwards engineered the web service that Dominos exposes that its various pizza ordering apps consume. There are even some libraries available for it. This, however, is not supported by Dominos and kind of pisses them off. Amazon don't seem to check too deeply if you have the rights to use the interfaces you use, so you could probably get away with something like that. But if push came to shove and a company decided to litigate, you would be the one in the vulnerable position. Or at least get a "cease and desist" letter.
10 |2000 characters needed characters left characters exceeded

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

cdburns123 avatar image
cdburns123 answered ·
I'd like to add; I did create scripts that scrape specific sites last week. It obtains all my bills, due dates, and amounts and reads them aloud when I say, "Alexa open my Bills".. (through using bluetooth through the Echo). Today, one site changed. Meaning, I now have to correct. So, as stated before, I do so for my own use only and would never scrape for anything other than myself.
10 |2000 characters needed characters left characters exceeded

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

dracophoenix avatar image
dracophoenix answered ·

I don't really see an answer here. The question got redirected down a path that the original poster did not ask for.

How would one go about making an Alexa web scraper? @cdburns123 if you're willing to share your skill source code, that would be awesome!

10 |2000 characters needed characters left characters exceeded

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.