The past week was relatively quiet with the exception of a major Amazon API Gateway outage. Apart from this incident on the 28th, Circle CI, Zoom, Sentry, and a couple of Azure platforms had notable outages.
Amazon API Gateway had a major outage in the US-West-2 (Oregon) region, with a direct impact on AWS Batch, AWS Service Catalog, Amazon Connect, and Amazon SageMaker on the 28th. Other platforms were also likely impacted, depending on how users use their API Gateway in conjunction with other services.
Notable Metrist-Reported Downtime
While these outages didn’t make the news, these issues caught by Metrist may have affected your company’s app and operations.
- AWS EC2 experienced networking issues in the US-East0-2 region for 20 minutes on September 26th. While AWS EC2 remained completely functional for users, external network calls experienced significantly latency, as observed through Metrist use of EC2 to monitor 28 other services from that region.
- Azure VMs was unable to create new instances in Azure East US 2 for 25 minutes on the 27th. The error message associated with the outage indicated that the platform was out of capacity.
- Azure AKS was unable to create deployments in Azure East US 2 for 31 minutes on the 27th. This outage was likely related to the Azure VM capacity issues, as if no VMs are are available, an AKS deployment can’t be created.
- CircleCI was down twice on the 26th. The first outage lasted 58 minutes and involved the platform being down then degraded when tested from the eastern US. The second outage involved the platform being unable to Run Monitor Docker Workflow for 30 minutes.
- Zoom was down (unable to join calls from parts of Canada) 10-30% of the time for 17 hours on the 27th. Then on the 2nd, 50% of the attempts to join Zoom timed out at the 2 minute mark for a total of 8 hours, 52 minutes.
- Sentry had issues for about 23 minutes on the 20th, in which issue creation wait times spiked to about 5 minutes in all North America regions. The latency for this product usually averages about 12 seconds.
- Cloudflare was down in parts of the US Northwest region, with ping checks not responding and some CDN and DNS checks failing for about 1 hour and 9 minutes on the 2nd.