As with API services, web pages have their owners, and they may or may not be open to the idea of scraping their data. If there is an API in place, this is always preferred over scraping, for the following reasons:
- First, it is usually way better and simpler to use, and there are a number of guarantees that API owners will retain its structure, or at least let you know of upcoming changes in advance. With HTML web pages, there is no guarantee whatsoever; the website will often change, and they won't tell you ahead of time, so expect lots of emergency breaking changes!
- Second, being a good citizen, it is substantially cheaper, computation-wise, to serve raw data than a full-blown HTML page, so the service owners will be thankful.
- Lastly, some data (for example, historic changes) will not be available via the web page.
However, there are plenty of examples...