One of the great things about our industry is that it is always evolving because people are constantly putting forth new ideas and challenging the way things have always been done.
For example, in the past SREs and other observability enthusiasts were overly concerned with types of data, ignoring what we can do with it. But now we are growing to a place where the 4 golden signals and 3 pillars are simply tools in our toolbox. Further, conversations are trending towards, “What else can I observe, and what are the outcomes I can drive with all this information?”
One of the ways to answer those questions is through what I call “Full Stack Observability.” But what is Full Stack Observability – and why is it important? Let’s discuss it in this article.
What is Full Stack Observability?
A couple years ago, I started using the term “full stack observability.” I don’t know if I heard it or made it up, but it just made sense to me. Now, that probably sounds like a cheesy name to most of us, but I think there might be something to it.
The idea is that you have observability when you can measure all of the components of your system. Typically when we think about our systems, we think of things like the network layer, the infrastructure layer, the database layer, the application layer, and the client layer.
But modern software is complex, and today, the layers we need to care about go so much deeper. We have to care about Kubernetes cluster health, ML model performance, serverless functionality, security, and so much more.
So I was delighted recently when I saw that New Relic released its 2022 observability forecast and they began to champion this idea of full stack observability. In their report, they define 17 different components and capabilities for full-stack observability.

Positive Outcomes of Full Stack Observability
Surprisingly, New Relic said that only 27% of organizations are going beyond the core of log management, application monitoring, and distributed tracing to monitor everything in their stack. But for the 27% who did achieve full stack observability, the positive outcomes are a DevOps dream: fewer incidents, faster mean time to detect, and faster mean time to resolution.
In full disclosure, this is a marketing report from a vendor with motives. So in regard to these outcomes, it’s likely that these stats are accurate, but it’s also a good idea to take a look at the report yourself where they outline the learnings in detail. New Relic’s research shows that organizations engaged in what they define as full stack observability had fewer incidents, faster mean time to detect, and faster mean time to resolution.
So if observability is about outcomes, we have at least one indication that rather than focusing on types of data, focusing on what is observable can lead to better outcomes.

A Sizable Gap In New Relic’s Definition of Full Stack Observability
Despite all of the new ideas and innovation in our space, I believe we have a major gap in how we think about gaining true full stack observability. And this gap is the cause of 70% of all the outages that SaaS companies reported on their status pages over the past 5 years: cloud dependencies.
Why cloud dependencies? Modern apps are built on other apps. In fact, the average digital business relies on 137 different cloud services to power their software and run their company. That’s Everything from AWS, to Twilio, Github, Stripe, Zoom, Snowflake, Slack, Avalara, and more.
But when one of these products goes down, the apps that are dependent on them risk going down – or at the very least degrading the user experience. However, when these outages happen it’s often unclear which app is at fault. So we are often sent scrambling to answer the question, “is it me, or is it them?”
Not only are we faced with the “me or them” problem, status pages don’t help either – and even gaslight us. That’s because status pages often don’t update us or verify our suspicions for 20, 30, or even 60 minutes – which delays efficient incident response for that amount of time or longer. (In fact, we track this at Metrist, and on average, it takes status pages 25 minutes to be updated from when we first detect the issue.)
Despite the significant challenges at hand, let’s be honest – not all of our cloud dependencies are in the critical path. In fact for many, if they go down, you can remain up. Further, some of your cloud dependencies will deliver better than what their SLA promises.
But honestly – how can we hope to have full-stack observability if we don’t have observability into an integral part of our stack – our cloud dependencies?
How Closing the Gap of Cloud Dependency Observability Can Improve Outcomes
New Relic’s report presents compelling evidence for the positive outcomes of full-stack observability. However, they neglected to include what evidence shows to be a prevalent culprit of today’s outages: health of the cloud products we build on and with.
In order to improve our visibility and our incident response, I believe the best way to do that is to understand how cloud vendors contribute to incidents. Doing so can help us understand what the risks are and where we are vulnerable instead of being blind to impact from the dozens of dependencies we commonly leverage.
Full stack observability that includes cloud dependencies can help us respond to issues with full context about everything happening in the stack, rather than only having information about the systems we own. And it can help us hold our cloud vendors more accountable because we aren’t going to stop building on them, but we can make sure we are working with providers that support our SLAs, rather than standing in the way of them.
For me, this is the biggest opportunity that observability has today.