
Who is the Datadog for Datadog?
Let’s get a little trippy. In light of the major Datadog outage on Wednesday, people were asking, “Who is the Datadog for Datadog?” When
Let’s get a little trippy. In light of the major Datadog outage on Wednesday, people were asking, “Who is the Datadog for Datadog?” When
When it comes to incident response, AIOps is an up-and-coming field. Reliability is critical to companies, but in today’s complex, interdependent software environment, observability and
Identifying the source of an outage is the name of the game when it comes to incident response and observability. But what if you were
It’s been an eventful start to the year and we wanted to share some updates with you all. We’re Launching Crowdsourced Data. Want To
One of the great things about our industry is that it is always evolving because people are constantly putting forth new ideas and challenging the
Observability is an exciting, emerging field. My co-founder, Ryan, and I have been here since the early days (including being some of the first hires
There was a major Microsoft Azure outage between 07:05 UTC and 09:45 UTC today (25 January 2023). Customers experienced issues with networking connectivity, manifesting as
This blog post is a continuation of our blog post on how we landed at Elixir. You may want to read or at least skim
When it comes to observability for Cloud Dependencies, we often think about how we can use that data for incident response. However, that data can
When people talk about observability, it’s usually in the context of obtaining data (metrics, logs, and traces) for the purpose of resolving incidents. But what
Monitoring and observability: they seem like simple concepts, but their definitions are hotly contested issues, especially when connotations about their purpose come into play. In
Highlights of the AWS Re:Invent Conference What happened at this year’s AWS Re:Invent? We attended and can confirm it was a great event. (Shout-out if
Why is the official AWS status page so hated? Sometimes it appears to be useless. It’s one of the biggest and most important products in
We had a great time at DevGuild: Incident Response and learned a lot from some amazing speakers that shared their experiences from Spotify, Zendesk, Salesforce,
Is it possible to achieve 99.99% uptime when your app is built on top of other apps?? Jeff Martens will dive into this question at
Software stacks are a dime a dozen these days, and picking the right one can seem somewhere between difficult and impossible. But while some people
The DevGuild: Incident Response conference is being held November 15-17th this year, showcasing the knowledge of how leading SREs are “increasing resiliency at every level
Today, Jeff, the team, and I are excited to announce the general availability of Metrist. The product we’ve created is software with a web and
Google Cloud Next 2022 was an exciting event, with the company announcing innovations in three major areas: infrastructure, data, and collaboration. To learn more about
Although APIs are the lifeblood of so many software applications today, the impact on their dependencies when they break can often be overlooked. And while
Just like any tech company, fintech is reliant on third-party cloud apps. However, if those apps malfunction, it puts high-stakes businesses at risk. Luke Rotta
The DevOps field (and we at Metrist) have always realized how important reliability is. But according to Information Week, governments in the US, UK, and
It’s been an eventful summer at Metrist! We have some exciting new features, and some great articles to explore. Check them out below! More
Many companies are using third-party vendors to increase their development speed and power. However, outages in these third-party apps can cause issues for companies’ apps
A power outage to a single availability zone within the US East 2 region resulted in a widespread outage for about 2 hours (10 AM
Site Reliability Engineering (SRE) is a hot topic and job title, but what exactly is SRE? Are we just putting a new name on an
When your app goes down, your first thought might be, “is it us or is it one of our 50 SaaS and cloud providers?” One
Today’s interview is with DevOps expert Linda Ypulong. As Director of Engineering at Credit Karma, she has a lot of experience working with third-party dependencies
New SaaS integrations. More detailed statuses. An interview with a DevOps data reliability expert. All that and more Metrist updates to kick off your summer.
DevOps expert Jeff Smith explains how site reliability data from third parties can influence your app’s performance and how you can leverage that data. Today’s
Real-time performance and availability monitoring for the web’s most built upon products.