Understanding defect cascading
A single error can leave your system in an unknown state. Unless you have identified and replicated that problem as part of your testing, it also leaves your system in an untested state. Your application may work well, but there’s a higher risk of issues than during regular operation. Triggering one error state is a great way to look for other problems.
Example problems that arise during error handling include the following:
- Excessive logging – It’s vital to record information about errors, but if a failure is recurring many times a second, that adds load to the system, either in terms of disc usage or network bandwidth to report it. Using those resources can trigger further issues.
- Lack of detail in logging – A regular finding in post-mortem meetings after failures was that the logs were hard to use and didn’t have helpful information. Use the logs while you’re testing to discover weaknesses before...