In an Elasticsearch ecosystem, it can be immensely useful to monitor nodes and clusters in order to manage and improve their performance and state. There are several issues that can arise at the cluster level, such as the following:
- There can be node overheads; for instance, where some nodes can have too many shards allocated and can become a bottleneck for the entire cluster
- Node shutdown can occur due to many reasons, such as, full disks, hardware failures, and power problems
- Shard relocation problems or corruptions, in which some shards are unable to be initialized and go online due to some issues.
- Having very large shards can also be an issue; index performance can decrease due to large Lucene segments merging
- Empty indices and shards waste memory and resources; however, because every shard has a lot of active threads, if there is a huge number of unused indices...