This was a very quiet, stable week for the cloud! Kudos to those keeping the cloud industry reliable. As PagerDuty once said, “Uneventful days are beautiful days (for on-call engineers).” But for the minor outages of the week, check out our report for Monday, August 15th – Sunday, August 21st.
Newsworthy:
- Netlify had a major outage on Saturday the 20th. The issue reported at 7:35 AM Pacific involved, “increased latencies, timeouts, and errors on [the] High-Performance Edge Network,” and was solved by 8:50 AM Pacific time.
- Datadog was unable to retrieve events on an on and off basis. The issue started on Wednesday the 17th and is still continuing until today (Monday, August 22nd). It appears that Datadog is returning frequent (but not constant) failures of 502s on some API requests to retrieve events. There doesn’t appear to be a pattern with the failures, as the timing and geographic location of the 502s are erratic. However, the problem also appears to be specific to the API because the UI does not seem to be affected. So far, this issue has not been reflected on Datadog’s status page.
Notable Metrist-Reported Downtime
While these outages didn’t make the news, these issues caught by Metrist may have affected your company’s app and operations.
- Azure Blog Storage was unable to create new storage accounts in Azure East for two 10-minute instances on Tuesday, August 16th.
- AWS ELB was unable to change the target group for 26 minutes in AWS-US-East-1 for 26 minutes on Wednesday the 17th.
- AWS ECS also had trouble in AWS-US-East-1 on the 17th, as it was unable to create service in the region for 4 minutes which coincided with the ELB outage.
- Zoom had extreme latency when joining calls from the eastern US for 1 hour 20 minutes on Friday the 19th.
- CircleCI had extreme latency for nearly 2 hours when running Machine Workflows from AWS-US-West-2 on Saturday the 20th.
Apps are bound to go down, but as long as we’re aware and have a backup plan, our companies can be more resilient.
If you’d like to keep track of the apps you depend on in real-time, try Metrist.