Apache Spark - the full stack
With all of this background information behind us, let's take a quick look at the full Spark stack (shown in the following diagram), which used to be a lot simpler, showing how the Spark ecosystem is continually evolving:
The Spark stack currently includes the following features:
It provides the Spark SQL feature. This feature uses SQL for data manipulation while maintaining the underlying Spark computations. It also provides the vital interface via exposing the Datasets to external systems through JDBC/ODBC, arguably the best value of Spark SQL.
Advanced analytics, which is still evolving; look out for features such as parameter server and neural networks in the later versions of Spark.
It provides the Dataset/DataFrame API, of course. It is one of parts we are focusing on in this book and we will see more of it in the following chapters.
The catalyst optimizer is an interesting beast. It is the proverbial software layer that separates a declarative API/interface...