Distinct advantages of Spark
Now that we understand the Spark components, let's move to the next step to understand what the key advantages of spark are for distributed, fault tolerant processing over its peers in this section. We will also touch upon the situations where Spark might not be the best choice for the solution:
- High performance: This is the key feature responsible for the success of Spark, the high performance in data processing over HDFS. As we have seen in the previous section, Spark leverages its framework over HDFS and the Yarn eco-system, but offers up to 10x faster performance; this makes it a better choice over map-reduce. Spark achieves this performance enhancement by limiting the use of latency intensive disk I/O and leveraging over it in memory compute capability.
- Robust and dynamic: Apache Spark is robust in its out-of-the-box implementation and it comes with over 80+ operations. It's built in Scala and has interfacing APIs in Java, Python, and so on. The entire combination...