In this chapter, you learned the basic etiquette for respectfully crawling the web. You learned what a robots.txt file is and why it is important to obey it. You also learned how to properly represent yourself using User-Agent strings. Controlling your scraper via throttling and caching was also covered. With these skills, you are one step closer to building a fully functional web scraper.
In Chapter 4, Parsing HTML, we will look at how to extract information from HTML pages using various techniques.