Why So Many Companies Run in AWS us-east-1

Amazon Web Services (AWS) has become an essential part of the cloud computing landscape, providing a wide range of services across numerous geographic regions, including five in the United States. One region, in particular, us-east-1, has long been the default and preferred choice for many businesses. After the recent AWS outage effecting us-east-1, we went […]
The Data Behind Delayed Status Page Updates

In today’s world, apps are commonly built on top of other apps. Utilizing third-party cloud, API, and SaaS products can help us deliver more and better features, faster. With increasing cloud dependency, when outages occur, it is crucial for users to receive timely updates on the status of the issue. A recent study by Metrist […]
Back to the Future — Simplifying the Stack

When you start a company with the idea to use external funding to get it going, the first order of business is not to build the most awesome, scalable, presentable infrastructure. The first order of business is to build what is essentially a demo that shows the essence of your ideas to potential early stage […]
How to Build a Real-Time PagerDuty-Compatible Monitor for GitHub Uptime

GitHub is a pretty critical part of a software developer’s workflow. So if it’s down, you’d want to know ASAP. But it often takes a while for the GitHub status page to update. That’s one reason why it’s a good idea to set up continuous monitoring for the software so you can be in the […]
How We Built Our Grafana Datasource Plugin for Cloud Dependency Monitoring

Grafana is an incredibly powerful platform– and our company is generating some pretty interesting data. So we wanted to create a plugin for our customers so they could start to use our data to an even fuller potential. But first we needed to build the plugin – which ended up being a pretty straightforward process, […]
Slack Said It Had 100% Uptime. Did It Really?

Not too long ago Gergely Orosz pointed out on Twitter that Slack reported 100% uptime since 2022, but he didn’t think that was the case. So who was right, Orosz or Twitter? We functionally monitor SaaS products like Slack to monitor for downtime in real time. So we set out to see whether Slack really […]
Unlocking the ROI of Third-Party Cloud Dependency Monitoring

Improving incident response – and with it, brand equity – is always a beneficial practice. And during these turbulent times in the software industry, it has never been more important. But amidst layoffs of SRE teams, slashed budgets, and elimination of spending on tools, it has also never been more difficult to deliver reliable software. […]
Who is the Datadog for Datadog?

Let’s get a little trippy. In light of the major Datadog outage on Wednesday, people were asking, “Who is the Datadog for Datadog?” When an important tool like Datadog is down – how can you tell if it’s down or back up again without constantly checking status pages or social media? Well That’s Where […]
What is AI Ops and How is AI Ops Useful for Incident Response?

When it comes to incident response, AIOps is an up-and-coming field. Reliability is critical to companies, but in today’s complex, interdependent software environment, observability and incident response is becoming more and more complex. So, it’s useful to use AI to improve incident response, observability, and the reliability of our systems – which is where AIOps […]
The Overlooked Culprit Behind 70% of SaaS Outages

Identifying the source of an outage is the name of the game when it comes to incident response and observability. But what if you were blind to the source of up to 70% of those outages? That’s the case for most companies because it’s difficult to get visibility into a fundamental part of today’s apps: […]