How to avoid being detected as a bot
I hesitated about adding this section after everything I mentioned about scraping ethics. I think I made my point clear when I said that when the owner says no, it means no. But if I'm writing a chapter about scraping, I think I need to show you these tools. It's then up to you what to do with the information you have learned so far.
Websites that don't want to be scraped, and are being actively scraped, will invest a good amount of time and money in trying not to be scraped. The effort would become even more important if the scrapers damage not only the site's performance but also the business.
Developers in charge of dealing with bots won't rely only on the user agent because, as we saw, that could be easily manipulated. They should rely only on evaluating the number of requests from an IP because, as we also saw, scrapers can slow down their scripts, simulating an interested user.
If the site can't stop...