Newsworthy:
Zoom’s API began throwing errors just after midnight Pacific for almost 22 hours on the 6th. Issues started right after Zoom’s previously-announced planned maintenance.
The platform likely broke something with authentication while doing other maintenance. And since the incident occurred over a weekend and did not impact the core functionality of joining meetings, it took Zoom almost a day to discover and fix the issue.
Notable Metrist-Reported Downtime
While these outages didn’t make the news, these issues caught by Metrist may have affected your company’s app and operations.
GCP
- GCP Compute Engine was unable to create new instances in GCP US East 4 on the 31st. The GCP error message indicated a lack of capacity.
AWS
- AWS EC2 was slow then unable to run new EC2 instances in AWS US East 2 for 50 minutes on the 31st. This was likely caused by the RDS outage mentioned below. It took more than 30 minutes for AWS to update their status page.
- AWS RDS was unable to create new RDS instances in AWS US East 2 for 44 minutes on the 31st. This incident was likely connected to the EC2 outage mentioned above.
Azure
- Azure AKS was unable to create clusters in Azure Canada Central for 34 minutes on the 1st.
Saas/Other
- PagerDuty was unable to create incidents with brief failures in all Metrist-monitored North America regions. This outage lasted for 10 minutes on the 1st. PagerDuty quickly updated their status page about an Events API issue.
- Zoom experienced an outage for a subset of users across North America. Participants were unable to connect to meetings for an outage of at least 33 minutes on the 3rd. A resolution was posted on Zoom’s status page at 11:33 am Pacific.
- Slack experienced on and off latency all day on Saturday when attempting to send messages from AWS US East 1 on the 5th.