Summary
In this chapter, we broke out a typical predict analytics project into phases and explained that the first phase is where you define what data is to be used.
Typically, the more the data, there is better the performance (or results) of a predictive model, but at some point (as in the case of a big data source) there may be too much data, at least to effectively deal with.
After reviewing the reasons why big data is so challenging, we instructed on how to gauge your data source, to qualify it as a big source, and then offered various proven techniques for addressing the common challenges of using big data.