Working with datasets
You cannot do predictive analytics without a dataset. Although we are surrounded by data, finding datasets that are adapted to predictive analytics is not always straightforward. In this section, we present some resources that are freely available. We then focus on the dataset we are going to work with for several chapters. The Titanic
dataset is a classic introductory datasets for predictive analytics.
Finding open datasets
There is a multitude of dataset repositories available online, from local to global public institutions, from non-profit organizations to data-focused start-ups. Here's a small list of open dataset resources that are well suited for predictive analytics. This, by far, is not an exhaustive list:
Note
This thread on Quora points to many other interesting data sources: https://www.quora.com/Where-can-I-find-large-datasets-open-to-the-public . You can also ask for specific datasets on Reddit at https://www.reddit.com/r/datasets/.Â
- UCI Machine Learning Repository...