This week there were a number of newsworthy and noteworthy issues with the cloud apps our apps depend on.
- Slack had two incidents on Tuesday the 26th. The first involved difficulty connecting some third-party Slack apps and occurred between 5:45 AM PDT and 5:23 PM PDT. The second mainly affected International Data Residency customers and involved failures of messages, threads, file uploads/downloads, and API calls.
- AWS US East 2 had a significant outage on Thursday the 28th due to a power outage to a single availability zone within the US-East-2 region. Several AWS services and several SaaS products (including Zoom, Auth0, Zendesk, Okta, WebEx, New Relic, and SmartThings) were down for over two hours (10:00 AM to 12:16 PM according to Metrist data). See our article for more information about the incident.
Metrist caught these outages, even if they didn’t make the news. See if any of the following may have affected your company’s app and operations.
- Trello had a partial outage (down in 3 regions) on Tuesday the 26th for 17 minutes.
- Azure AKS had a partial outage (clusters couldn’t be created for 28 minutes in Canada Central) on Wed the 27th.
- Azure VMs had a partial outage (instances couldn’t be created) for 29 minutes in Canada Central) on Wednesday the 27th. This outage was related to the Azure AKS outage since AKS is built atop VMs.
- Azure Monitor was down/slow for about 10 minutes on Thursday the 28th in West US 2.
- Snowflake experienced a latency spike in US East 1 on Saturday the 30th for 30 minutes (3:30 AM to 4:00 AM PT).
- Datadog had a partial outage when events could not be retrieved when attempted from all 5 North American regional monitoring instances for 21 minutes on Sunday the 31st.
Apps are bound to go down, but as long as we’re aware and have a backup plan, our companies can be more resilient.
If you’d like to keep track of the apps you depend on in real-time, try Metrist.
Jeff & The Metrist Team