In the previous chapters, we have seen Spark examples using Spark shell in local mode or executing Spark programs using local master mode. As Spark is a distributed processing engine, it is designed to run on a cluster for high performance, scalability, and fault-tolerant behavior.
In this chapter, we will discuss Spark application architecture in distributed-mode. This chapter will also explain how various components of Spark in distributed mode interact. Along with that, this chapter will focus on various cluster managers that can be used to run Spark jobs in clusters. Also, we will discuss some effective performance parameters for running jobs in cluster mode. After this chapter, the reader will be able to execute Spark jobs effectively in distributed mode.