Recap of DevGuild: Incident Response 2022

We had a great time at DevGuild: Incident Response and learned a lot from some amazing speakers that shared their experiences from Spotify, Zendesk, Salesforce, Honeycomb, Snyk, and more. I wanted to recap the experience and provide some takeaways from the amazing conference. And don’t take my word for it, you can watch the replay […]

Here’s How Chicago Trading Company’s Luke Rotta Engineers Resilient Systems

Just like any tech company, fintech is reliant on third-party cloud apps. However, if those apps malfunction, it puts high-stakes businesses at risk. Luke Rotta is the SRE & Observability Manager at Chicago Trading Company (CTC) and has worked in the Fintech industry for over 20 years. CTC is a privately held company with offices […]

The US, UK, and EU Want to Regulate Cloud Reliability. Is That Necessary? 

The DevOps field (and we at Metrist) have always realized how important reliability is. But according to Information Week, governments in the US, UK, and EU are now also starting to recognize the importance – and develop legislation to support it.  But while all three governing bodies are pushing to develop or expand current legislation […]

Managing Risks from Third-Party Vendors with Erin McKeown of Zendesk

Many companies are using third-party vendors to increase their development speed and power. However, outages in these third-party apps can cause issues for companies’ apps depending on those vendors. Helping companies stay resilient in the face of these outages is the specialty of today’s guest. We were very excited to sit down with Erin McKeown, […]

What Went Down: July 31st, 2022

This week there were a number of newsworthy and noteworthy issues with the cloud apps our apps depend on.  Newsworthy:  Slack had two incidents on Tuesday the 26th. The first involved difficulty connecting some third-party Slack apps and occurred between 5:45 AM PDT and 5:23 PM PDT. The second mainly affected International Data Residency customers […]

What Went Down: Cloud App Outages from Metrist

Down outage what went down image

What went down this week? There were a number of newsworthy and noteworthy outages for the cloud apps our platforms depend on.  Newsworthy:  Some of the outages to make the headlines this week were from heavy hitters GCP, Oracle, and Microsoft 365.  Heatwave leads to GCP and Oracle Outages. Google Cloud and Oracle chose to […]

Is SRE Just Ops with a New Name?

Is SRE just Ops with a new name?

Site Reliability Engineering (SRE) is a hot topic and job title, but what exactly is SRE? Are we just putting a new name on an old concept? Did we actually knock down the wall between Dev and Ops, or are we still tasking a single team with making the rest of the company look good? […]

DevOps Expert Jeff Smith On Third-Party Reliability Data

Jeff Smith profile image with blog title

DevOps expert Jeff Smith explains how site reliability data from third parties can influence your app’s performance and how you can leverage that data. Today’s expert is Jeff Smith. He is the Director of Production Operations at ad tech company Basis Technologies (formerly Centro) and was previously Manager of Site Reliability Engineering at Grubhub.  Jeff […]

Observability like air traffic control with Mike Canzoneri of Torch

Welcome to the first in a series of interviews with software and SaaS industry experts on what it is like to build on, and manage the reliability of, cloud hosted third parties. In this series, we are interviewing SREs, Software Engineers, Directors of Engineering, CTOs, and more from companies like Weedmaps, Freshly, Zendesk, Netflix, and […]