Chapter 7. Testing and Debugging Distributed Applications
Distributed systems, both large and small, can be extremely challenging to test and debug, as they are spread over a network, run on computers that can be quite different from each other, and might even be physically located in different continents altogether.
Moreover, the computers we use could have different user accounts, different disks with different software packages, different hardware resources, and very uneven performance. Some can even be in a different time zone. Developers of distributed systems need to consider all these pieces of information when trying to foresee failure conditions. Operators have to work around all of these challenges when debugging errors.
So far, in this book, we have not spent much time on the extremely important issue of what to do when something goes either wrong or differently than we expect, and we instead concentrated on some of the tools that we can use to write and deploy our applications...