Summary
This is an interesting chapter where we discussed the broader and wider picture of where Spark fits in the big data and analytics ecosystem. First, we looked at the Datasets that accompany this book as well as some interesting IDEs. We then discussed the role of data scientists and what they expect from a Spark stack, which led to our discussion to the Spark-based Data Lake architecture and then the Spark stack. We also looked at Parquest as an efficient storage format.