Summary
In this chapter, you learned the basics of working with Apache Spark. First, you downloaded and installed Spark and configured PySpark to run in Jupyter notebooks. You also learned how to scale Spark horizontally by adding nodes. Spark uses DataFrames similar to those used in pandas
. The last section taught you the basics of manipulating data in Spark.
In the next chapter, you will use Spark with Apache MiNiFi to move data at the edge or on Internet-of-Things devices.