Summary
In this chapter we have explored the qualifiers of large datasets, their common characteristics, the problems of repetition, and the reasons for the hyper-growth in volumes; in fact, the big data context.
The need for applying conventional Machine learning algorithms to large datasets has given rise to new challenges for Machine learning practitioners. Traditional Machine learning libraries do not quite support, processing huge datasets. Parallelization using modern parallel computing frameworks, such as MapReduce, have gained popularity and adoption; this has resulted in the birth of new libraries that are built over these frameworks.
The concentration was on methods that are suitable for massive data, and have potential for the parallel implementation. The landscape of Machine learning applications has changed dramatically in the last decade. Throwing more machines doesn't always prove to be a solution. There is a need to revisit traditional algorithms and models in the way...