Design patterns and techniques
In this section, we'll outline some design patterns and general techniques for use when writing your own analytics. These are a collection of hints and tips that represent the accumulation of experiences working with Spark. They are offered up as guidelines for effective Spark analytic authoring. They also serve as a reference for when you encounter the inevitable scalability problems and don't know what to do.
Spark APIs
Problem
With so many different sets of API's and functions to choose from, it's difficult to know which ones are the most performant.
Solution
Apache Spark currently has over one thousand contributors, many of whom are highly experienced world-class software professionals. It is a mature framework having been developed for over six years. Over that time, they have focused on refining and optimizing just about every part of the framework from the DataFrame-friendly APIs, through the Netty-based shuffle machinery, to the catalyst...