Summary
News data is a very important data source as it gives us a collective glimpse of the major themes in our day-to-day lives. We have witnessed how it can be a difficult process to collect news data and do some text mining on it. We have understood the basic concepts of web scraping, which is required in most data collections from the public domain. We have learned about the various problems we can have with textual data and how to work around them. An important point to mention about this chapter is the importance of maintaining an unbiased point of view while analyzing text data. Otherwise, it is very easy for text data mining to denigrate into a bad case of selection bias. Text data analysis is very diverse, a rapidly developing area of research, and tough to contain in one chapter. We encourage our readers to explore different text mining tools and find out what different use cases they can build on the datasets that we collected; this will certainly make for an interesting exercise...