Why choose Apache Spark?
In this section, we will discuss the applications of Apache Spark and its features, such as speed, reusability, in-memory computations, and how Spark is a unified platform.
Speed
Apache Spark is one of the fastest processing frameworks for data available today. It beats Hadoop MapReduce by a large margin. The main reason is its in-memory computation capabilities and lazy evaluation. We will learn more about this when we discuss Spark architecture in the next chapter.
Reusability
Reusability is a very important consideration for large organizations making use of modern platforms. Spark can join batch and stream data seamlessly. Moreover, you can augment datasets with historical data to serve your use cases better. This gives a large historical view of data to run queries or build modern analytical systems.
In-memory computation
With in-memory computation, all the overhead of reading and writing to disks is eliminated. The data is cached, and...