Yet Another Resource Negotiator (YARN)
Hadoop YARN is one of the most popular resource managers in the big data world. Apache Spark provides seamless integration with YARN. Apache Spark applications can be deployed to YARN using the same spark-submit
command.
Apache Spark requires HADOOP_CONF_DIR
or YARN_CONF_DIR
environment variables to be set and pointing to the Hadoop configuration directory, which contains core-site.xml
, yarn-site.xml
, and so on. These configurations are required to connect to the YARN cluster.
To run Spark applications on YARN, the YARN cluster should be started first. Refer to the following official Hadoop documentation that describes how to start the YARN cluster: https://hadoop.apache.org/docs
YARN in general consists of a resource manager (RM) and multiple node managers (NM) where resource manager is the master node and node managers are slave nodes. NMs send detailed report to RM at every defined interval that tell RM how many resources (such as CPU slots and RAM...