Chapter 12. Scalable Frameworks
The advent of social networking, interactive media, and deep analysis has caused the amount of data processed daily to skyrocket. For data scientists, it's no longer just a matter of finding the most appropriate and accurate algorithm to mine data; it is also about leveraging multi-core CPU architectures and distributed computing frameworks to solve problems in a timely fashion. After all, how valuable is a data mining application if the model does not scale?
There are many options available to Scala developers to build classification and regression applications for very large datasets. This chapter covers the Scala parallel collections, Actor model, Akka framework, and Apache Spark in-memory clusters. The following are the topics addressed in this chapter:
- Introduction to Scala parallel collections
- Evaluation of performance of a parallel collection on multicore CPU
- The actor model and reactive systems.
- Clustered and reliable distributed computing...