Introduction
In this chapter, we will look at best practices and troubleshooting techniques for various components of Hadoop. The same can be used to troubleshoot any other service or application.
With distributed systems and the scale at which Hadoop operates, it can become cumbersome to troubleshoot it. In production, most will use log management and parsing tools such as Splunk and a combination of Ganglia, Nagios, or other tools for monitoring and alerting.
In this chapter, we will build the basics of troubleshooting skills and how we can quickly look for keywords, which will point the users to common errors in the Hadoop cluster. Users are encouraged to read this chapter after reading Chapter 8, Performance Tuning, to better relate and understand the recipes in this chapter.