Summary
This chapter looked into how to diagnose node failures, determine the root cause of the problem, and apply corrective action. Some key things we learned are:
Many errors, from shard failures to slow query performance, are caused by
OutOfMemoryError
exceptionsRunning out of disk space on one node can cause other nodes to run out of disk space as well when shards are reallocated
Running Elasticsearch alongside other services that require a lot of memory can result in the operating system killing Elasticsearch to free up memory
The next chapter will talk about Elasticsearch 5.0, the next major release of the platform, and it will give you an overview of the various new monitoring tools that will accompany the Elasticsearch 5.0 release.