This chapter has helped you install Java, Scala, and Spark on a Linux machine that has been procured from an AWS EC2 instance. You can now use the same setup to execute the queries/examples that are provided in other chapters of this book.
In the next chapter, you will learn about the basic unit of Spark, which is RDD. RDD refers to an immutable Resilient Distributed Dataset, on which we can apply further actions and transformations.