Technical requirements
In this chapter, you will be using the Databricks Community Edition to run your code. This can be found at https://community.cloud.databricks.com:
- Sign-up instructions can be found at https://databricks.com/try-databricks.
- The code and data used in this chapter can be downloaded from https://github.com/PacktPublishing/Essential-PySpark-for-Scalable-Data-Analytics/tree/main/Chapter04.
Before we dive deeper into implementing real-time stream processing data pipelines with Apache Spark, first, we need to understand the general architecture of a real-time analytics pipeline and its various components, as described in the following section.