A sample Root Cause Analysis
Now that we have all of the information we need, let's create a root cause analysis report. This report can be in any format, really, but I've found that something along the following lines works well.
Problem summary
At approximately 1:50 A.M. on July 5, 2015 the server blog.example.com
unexpectedly rebooted. The watchdog
process initiated the reboot process due to a high load average on the server.
After investigation, the high load average appears to be caused by a custom e-mail application, which was left in a running state even though it has been migrated to another server.
From the data available, it seems the application consumed 100 percent of the root filesystem.
While I was unable to obtain process states from before the reboot, it appears the high load average might have also been due to the same application being unable to write to the disk.
Problem details
The time at which the incident was reported—07/05/2015 at 01:52
The timeline of the incident would...