An RDD is a compile-time type-safe. That means, in the case of Scala and Java, if an operation is performed on the RDD that is not applicable to the underlying data type, then Spark will give a compile time error. This can avoid failures in production.
There are some drawbacks of using RDDs though:
- RDD code can sometimes be very opaque. Developers might struggle to find out what exactly the code is trying to compute.
- RDDs cannot be optimized by Spark, as Spark cannot look inside the lambda functions and optimize the operations. In some cases, where a filter() is piped after a wide transformation, Spark will never perform the filter first before the wide transformation, such as reduceByKey() or groupByKey().
- RDDs are slower on non-JVM languages such as Python and R. In the case of these languages, a Python/R virtual machine is created alongside JVM. There...