January 25th Microsoft Outage And Its Effects Beyond Direct Azure Users

There was a major Microsoft Azure outage between 07:05 UTC and 09:45 UTC today (25 January 2023). Customers experienced issues with networking connectivity, manifesting as network latency and/or timeouts when attempting to connect to Azure resources in Public Azure regions.  This outage affected a majority of the 500 million users of the platform. However, direct […]

Why We Chose Elixir Part 2: Elixir and the CELP Stack

This blog post is a continuation of our blog post on how we landed at Elixir. You may want to read or at least skim that one before diving in here.  Choosing Elixir as a programming language is not enough – like a lot of powerful systems, it comes as a collection of parts and […]

Going Beyond Incident Response With Cloud Observability 

When it comes to observability for Cloud Dependencies, we often think about how we can use that data for incident response. However, that data can go beyond incident response and have a number of important applications.  In this article, we’ll discuss the ways cloud dependency data can inform ways to improve resiliency, get early warnings, […]

The Three Reasons You Need Observability

When people talk about observability, it’s usually in the context of obtaining data (metrics, logs, and traces) for the purpose of resolving incidents. But what if that data could be used for more than emergency situations? And what if expanding our understanding of what observability is can help us better resolve incidents – and maybe […]

Using Medicine to Understand Observability and Monitoring

Monitoring and observability: they seem like simple concepts, but their definitions are hotly contested issues, especially when connotations about their purpose come into play. In short, both terms are “heavily loaded” – but the “loadedness” isn’t always the same.  However, we’ve found that one possible way to understand monitoring and observability is by comparing them […]

What Happened at AWS re:Invent? Highlights From the 2022 Conference

Highlights of the AWS Re:Invent Conference What happened at this year’s AWS Re:Invent? We attended and can confirm it was a great event. (Shout-out if you saw Jeff or Ryan there!) If you weren’t one of the 50,000 people who attended in-person and 300,000 online, here’s a recap of the most important updates!   Observability […]

Introducing a Real AWS Status Page!

Why is the official AWS status page so hated? Sometimes it appears to be useless. It’s one of the biggest and most important products in the world – what could possibly make its status page so unreliable? There are a variety of reasons why the AWS status page is not reliable – or even a […]

Recap of DevGuild: Incident Response 2022

We had a great time at DevGuild: Incident Response and learned a lot from some amazing speakers that shared their experiences from Spotify, Zendesk, Salesforce, Honeycomb, Snyk, and more. I wanted to recap the experience and provide some takeaways from the amazing conference. And don’t take my word for it, you can watch the replay […]

How and Why We Chose Elixir at Metrist

Software stacks are a dime a dozen these days, and picking the right one can seem somewhere between difficult and impossible. But while some people say choosing the “right tool” doesn’t matter, we think there is an advantage in choosing the best stack for the job.   Although it may take a bit of legwork to […]