Chapter 11. Stackdriver
A major factor in the overall success of a software solution is reliability. For some systems, downtime can shake confidence, damage reputations, and steer potential customers into the open arms of competitors. In other systems, outages can cause work processes to fall behind or, in extreme cases, lead to cascading site-wide failures. The real cost of service outages can be hard to establish, but for most systems, it's safe to assume that a service outage is a big deal.
Building reliability into services starts long before those services ever make it to the cloud, but, once up and running, teams must be able to monitor their services effectively. Monitoring cloud services can become extremely complex as the number of services grow. With more and more teams leveraging managed services and hybrid-cloud solutions, the need for flexible and comprehensive monitoring tools becomes apparent. As we'll see in this chapter, Google has provided a very capable monitoring solution...