Summary
In this chapter, we covered what big data is all about and how we can analyze it. We showed the 3 Vs that constitute big data: volume, variety, and velocity. We also covered some ground on the big data stack, including Hadoop, HDFS, and Apache Spark. While learning Spark, we went through some examples of the Spark RDD API and also learned a few useful transformations and actions.
In the next chapter, we will get the first taste of running analytics on big data. For this, we will initially use Spark SQL, a very useful Spark module, to do simple yet powerful analysis of your data and later we will go on to build complex analytic tasks while learning market basket analysis.