Clusters, nodes and daemons
Cluster, node, and daemon is the terminology that we will use throughout this chapter. It is important to build a common understanding of the context around which these terms are used during this chapter.
- Cluster: A cluster is a group of computers (nodes) that work together in many aspects, and are often viewed as a single system.
- Node: A node is an individual component in the cluster.
- Daemon: In multitasking computer operating systems, a daemon is a computer program that runs as a background process, rather than being under control of an interactive user.
So now that we have terminology out of the way, what exactly does Spark need to run on a cluster? How is it managed? Is it a master/slave architecture? All these are key questions and need to be answered to fully understand the way how Spark works on a set of machines that comprise a cluster. Let's refer to the classical Spark architecture diagram (reference spark.apache.org/docs/latest) to understand this in a...