Investigating the filesystem being full
Earlier, we noticed that the filesystem was 100 percent full. Unfortunately, the version of sysstat
we have installed doesn't capture disk space usage. A useful thing to identify is when the filesystem filled up as compared to when our run queue started to increase:
Jul 5 01:48:01 localhost auditd[560]: Audit daemon is low on disk space for logging Jul 5 01:48:01 localhost auditd[560]: Audit daemon is suspending logging due to low disk space.
From the log messages we saw earlier, we could see the auditd
process identified the low disk space at 01:48
. This is extremely close to the time our run queue spike was seen.
This is building towards a hypothesis that the problem's root cause was a filesystem filling up, which caused a process to either launch many CPU intensive tasks or block the CPU for other tasks.
While this is a sound theory, we have to prove it to be true. One way we can get closer to proving this is to identify what is utilizing...