Troubleshooting
We have learned cluster configuration, repairing and scaling, and, finally, monitoring. The purpose of all this learning is for you to keep production environments up-and-running smoothly. You may choose the right ingredients to set up a cluster that fits your need, but there may be node failures, high CPU usage, high memory usage, disk space issues, network failures, and, probably, performance issues with time. You will get most of this information from the monitoring tool that you have configured. You will need to take the necessary action, depending on the problems that you are facing.
Usually, one goes about finding these issues via various tools that we've discussed in the past. You may want to extend the list of tools for investigation to include Linux tooling. These include netstat
and tcpdump
for network debugging; vmstat
, free
, top
, and dstat
for memory statistics; perf
, top
, dstat
, and uptime
for CPU statistics; and iostat
, iotop
, and df
for disk usage.
How do you...