Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Conferences
Free Learning
Arrow right icon

Google Kubernetes Engine was down last Friday, users left clueless of outage status and RCA

Save for later
  • 3 min read
  • 12 Nov 2018

article-image

On the 9th of November, at 4.30 am US/Pacific time,  the Google Kubernetes Engine faced a service disruption. It was questionable whether or not a user would be able to launch a node pool through Cloud Console UI. The team responded to the issue saying that they would get back to users with more information by Friday, 9th November 04:45 am US/Pacific time.

However, this was not solved by the given time. Another status update was posted by the team assuring users that mitigation work was underway by the Engineering Team. Users were to be posted with another update by 06:00 pm US/Pacific with current details.

In the meantime, affected customers were advised to use gcloud command to create new Node Pools.

An update for the issue being finally resolved was posted on Sunday, the 11th of November, stating that services were restored on Friday at 14:30 US/Pacific time.  . However, no proper explanation has been provided regarding what led to the service disruption. They did mention that an internal investigation of the issue will be done and appropriate improvements to the systems will be implemented to help prevent or minimize future recurrence of the issue.

According to a user’s summary on Hacker News, “Some users here are reporting that other GCP services not mentioned by Google's blog are experiencing problems. Some users here are reporting that they have received no response from GCP support, even over a time span of 40+ hours since the support request was submitted.

According to another user, “When everything works, GCP is the best. Stable, fast, simple, reliable. When things stop working, GCP is the worst. They require way too much work before escalating issues or attempting to find a solution”.
We can’t help but agree looking at the timeline of the service downtime.

Users have also expressed disappointment over how the outage was managed.google-kubernetes-engine-was-down-last-friday-users-left-clueless-of-outage-status-and-rca-img-0 google-kubernetes-engine-was-down-last-friday-users-left-clueless-of-outage-status-and-rca-img-1 google-kubernetes-engine-was-down-last-friday-users-left-clueless-of-outage-status-and-rca-img-2

Source:Hacker News

Unlock access to the largest independent learning library in Tech for FREE!
Get unlimited access to 7500+ expert-authored eBooks and video courses covering every tech area you can think of.
Renews at $19.99/month. Cancel anytime


With users demanding a root cause analysis of the situation, it is only fitting that Google provides one so users can trust the company better.
You can check out Google Cloud’s blog post detailing the timeline of the downtime.

Machine Learning as a Service (MLaaS): How Google Cloud Platform, Microsoft Azure, and AWS are democratizing Artificial Intelligence

Google’s Cloud Robotics platform, to be launched in 2019, will combine the power of AI, robotics and the cloud

Build Hadoop clusters using Google Cloud Platform [Tutorial]