Summary
Web scraping is a great way to gather more data for data science projects. In fact, Wikipedia can be a great source of information and has an API as well. For example, Wikipedia data can be combined with social media and sports data to predict how successful athletes will be (https://www.kaggle.com/noahgift/social-power-nba).
We saw how we can use web scraping to collect data files from the web, and how we can use it to collect text and data from webpages. These methods are useful for collecting data that may not otherwise be accessible, but remember to consider the ethics and legality before undertaking a large web scraping project.
We also saw how we can use APIs to collect data, such as with the Reddit API. Again, remember that websites and APIs each have their own TOS that we should follow.
This chapter concludes the Dealing with Data part of the book. We've gone from the basics of Python file handling and SQL all the way to collecting and analyzing raw...