Troubleshooting a failed Spark job
There are two aspects to troubleshooting a failed Spark job in a cloud environment: environmental issues and job issues. Let's look at both of these factors in detail.
Debugging environmental issues
Here are some of the steps involved in checking environmental issues:
- Check the health of Azure services in the region where your Spark clusters are running by using this link: https://status.azure.com/en-us/status.
- Next, check whether your Spark cluster itself is fine. You can do this for your HDInsight clusters by checking the Ambari home page. We saw how to check Ambari for the status in Chapter 13, Monitoring Data Storage and Data Processing, in the Monitoring overall cluster performance section. Here is the Ambari screen home page again for your reference:
- Check to see whether any service is down or whether any of the resources...