In this chapter, we explored the Apache Spark open source distributed data processing platform. We installed a copy of Apache Spark on our local computer. First, we learned about of Spark's core API using hands-on examples that explored Spark's resilient distributed dataset (RDD). Next, we explored the higher level APIs of Spark using datasets and DataFrames.
In the next chapter, we will look at traditional machine learning concepts.