Spark architecture
In the previous chapters, we discussed that Apache Spark is an open source, distributed computing framework designed for big data processing and analytics. Its architecture is built to handle various workloads efficiently, offering speed, scalability, and fault tolerance. Understanding the architecture of Spark is crucial for comprehending its capabilities in processing large volumes of data.
The components of Spark architecture work in collaboration to process data efficiently. The following major components are involved:
- Spark driver
- SparkContext
- Cluster manager
- Worker node
- Spark executor
- Task
Before we talk about any of these components, it’s important to understand their execution hierarchy to know how each component interacts when a Spark program starts.