Google Cloud went offline taking with it YouTube, Snapchat, Gmail, and a number of other web services

Update: The article has been updated to include Google's response on Sunday's disruption service.

Over the weekend, Google Cloud suffered a major outage taking down a number of Google services, YouTube, GSuite, Gmail, etc. It also affected services dependent on Google such as Snapchat, Nest, Discord, Shopify and more. The problem was first reported by East Coast users in the U.S around 3 PM ET / 12 PM PT, and the company resolved them after more than four hours. According to downdetector, UK, France, Austria, Spain, Brazil, also reported they are suffering from the outage.

https://twitter.com/DrKMhana/status/1135291239388143617

In a statement posted to its Google Cloud Platform the company said it experiencing a multi-region issue with the Google Compute Engine. “We are experiencing high levels of network congestion in the eastern USA, affecting multiple services in Google Cloud, GSuite, and YouTube. Users may see a slow performance or intermittent errors. We believe we have identified the root cause of the congestion and expect to return to normal service shortly,” the company said in a statement.

The issue was sorted four hours after Google acknowledged the downtime. “The network congestion issue in the eastern USA, affecting Google Cloud, G Suite, and YouTube has been resolved for all affected users as of 4:00 pm US/Pacific,” the company said in a statement.

“We will conduct an internal investigation of this issue and make appropriate improvements to our systems to help prevent or minimize future recurrence. We will provide a detailed report of this incident once we have completed our internal investigation. This detailed report will contain information regarding SLA credits.”

This outage resulted in some major suffering. Not only did it impact one of the most used apps by Netziens (YouTube and Sanpchat), people also reported that they were unable to use their NEST controlled devices such as turn on their AC or open their "smart" locks to let people into the house.

https://twitter.com/davidiach/status/1135302533151436800

Even Shopify experienced problems because of the Google outage, which prevented some stores (both brick-and-mortar and online) from processing credit card payments for hours.

https://twitter.com/LarryWeru/status/1135322080512270337

The entire dependency of the world’s most popular applications on just one backend in the hands of one company seems a bit startling. It is also surprising how so many people just rely on one hosting service. At the very least, companies should think of setting up a contingency plan, in case the services go down again.

https://twitter.com/zeynep/status/1135308911643451392

https://twitter.com/SeverinAlexB/status/1135286351962812416

Another issue which popped up was how Google cloud randomly being down is proof that cloud-based gaming isn't ready for mass audiences yet. At this year’s Game Developers Conference (GDC), Google marked its entry in the game industry with Stadia, its new cloud-based platform for streaming games. It will be launching later this year in select countries including the U.S., Canada, U.K., and Europe.

https://twitter.com/BrokenGamezHDR/status/1135318797068488712

https://twitter.com/soul_societyy/status/1135294007515500549

On Monday, Google released an apologetic update on the outage. They outlined the incident, detection and their response.

In essence, the root cause of Sunday’s disruption was a configuration change that was intended for a small number of servers in a single region. The configuration was incorrectly applied to a larger number of servers across several neighboring regions, and it caused those regions to stop using more than half of their available network capacity. The network traffic to/from those regions then tried to fit into the remaining network capacity, but it did not. The network became congested, and our networking systems correctly triaged the traffic overload and dropped larger, less latency-sensitive traffic in order to preserve smaller latency-sensitive traffic flows, much as urgent packages may be couriered by bicycle through even the worst traffic jam.

Next, Google’s engineering teams are conducting a thorough post-mortem to understand all the contributing factors to both the network capacity loss and the slow restoration.

Facebook family of apps hits 14 hours outage, longest in its history

Worldwide Outage: YouTube, Facebook, and Google Cloud goes down affecting thousands of users

YouTube went down, Twitter flooded with deep questions, YouTube back and everyone is back to watching cat videos.