I am writing a Custom Skill in Node.js. I want the user to speak a request and then my code should find the necessary information to send back from an external website. However, the website does not use an API; the content I need is within the HTML. I will then extract the content I need from the HTML code. Would https.get or https.require work for this task?
This is called 'web scraping' and is, in general, a poor way to do things. It usually breaks the terms and services of a web site, doesn't work with many modern, dynamic websites, and is fragile. If you are just jury rigging something for your own interest, sure, go ahead. If you are trying to produce a professional skill that others use, I would advise you to rethink your approach. In any event, that's a programming question not a ASK specific question. You can google the internet for plenty of ways to do web scraping with node.js.
A [b]website[/b] presents a visual interface (usually in HTML) for a human to consume via a browser. A [b]web service[/b] presents a programmatic interface (usually in JSON) for a computer to consume via (usually) a REST call. If a company wishes to makes its data available for people to programmatically access, they will provide a web service for doing so. This will probably be documented on the site, along with the terms and conditions that it is made available under. If you want to produce something officially, or you don't want to risk getting a lawsuit, this is the avenue to take. Web scraping is a technique to programmatically browse a website, extract information from the HTML, and return the data. So, it is a way to wrap a web service around a web site. You can also spy on the web site. If it has a dynamic Ajax-like interface, it is probably making web service calls under the covers. You can try to derive the API for those and then call that. This, however, is unofficial and may raise the ire of the web site in question. For example, various people have backwards engineered the web service that Dominos exposes that its various pizza ordering apps consume. There are even some libraries available for it. This, however, is not supported by Dominos and kind of pisses them off. Amazon don't seem to check too deeply if you have the rights to use the interfaces you use, so you could probably get away with something like that. But if push came to shove and a company decided to litigate, you would be the one in the vulnerable position. Or at least get a "cease and desist" letter.
I'd like to add; I did create scripts that scrape specific sites last week. It obtains all my bills, due dates, and amounts and reads them aloud when I say, "Alexa open my Bills".. (through using bluetooth through the Echo). Today, one site changed. Meaning, I now have to correct. So, as stated before, I do so for my own use only and would never scrape for anything other than myself.