Summary
In this chapter, we have learned various ways to collect data by scraping web pages. We also successfully scraped data from semi-structured formats such as JSON and XML and explored different methods of retrieving data in real time from a website without authentication. In the next chapter, you will learn about topic modeling—an unsupervised natural language processing technique that helps group documents according to the topics that it detects in them.