At the beginning of this week, Mandrill, a transactional email API for MailChimp users, experienced an outage where users were able to send but were unable to receive emails. The Madrill community also tweeted stating that they were also seeing ongoing errors with scheduled mail and webhooks and would resolve the issue soon.
https://twitter.com/mandrillapp/status/1092611488982945793
Sebastian Lauwers, the VP of Engineering at Dixa, a customer service software tweeted that the issue took too long to resolve. He also asked for the reason why Mandrill was taking so long--nearly 23 hours--to sort the issue.
https://twitter.com/teotwaki/status/1092624972252618754
Today, one of the users with the username GuyPostington posted an email received from Mandrill, on HackerNews. The email explains the reason for Mandrill’s outage and how they will be addressing the issue. Mandrill uses a sharded Postgres setup as one of their main datastores. According to the email, “On Sunday, February 3, at 10:30 pm EST, 1 of our 5 physical Postgres instances saw a significant spike in writes. The spike in writes triggered a Transaction ID Wraparound issue. When this occurs, database activity is completely halted. The database sets itself in read-only mode until offline maintenance (known as vacuuming) can occur.” They have also tweeted the same
They further mentioned that the database is large due to which the vacuum process takes a significant amount of time and resources, and there’s no clear way to track progress.
To address this issue, the community writes, “We don’t have an estimated time for when the vacuum process and cleanup work will be complete. While we have a parallel set of tasks going to try to get the database back in working order, these efforts are also slow and difficult with a database of this size. We’re trying everything we can to finish this process as quickly as possible, but this could take several days, or longer.”
The email also states that once the outage is resolved, the community plans to offer refunds to all the affected users.
To know about this news in detail, visit Mandrill’s Tweet thread.
Microsoft Cloud services’ DNS outage results in deleting several Microsoft Azure database records
Internet Outage or Internet Manipulation? New America lists government interference, DDoS attacks as top reasons for Internet Outages across the world
Outage in the Microsoft 365 and Gmail made users unable to log into their accounts