Observability is an exciting, emerging field. My co-founder, Ryan, and I have been here since the early days (including being some of the first hires at Server Density and New Relic, back when we just did monitoring), we’ve seen the field grow in significant ways, and we’re here to help it evolve even more.
But as we imagine how to improve the reliability of our software and systems, it’s important to be contemplative of where we’ve been, then use that history (and shortcomings) to inform where we’re going. That’s what I want to talk about today. Where did we see observability start, and what are our predictions for the future of the field?
Observability in Its Early Days
In the “ancient past” of observability, the unfortunate fact is that very few of us were talking about observability through the lens of outcomes. Instead, when people talked about observability, they talked about the three pillars: Metrics, Logs, and Traces.
All three of these things are forms of data and how information is shaped. And each imply what they can tell us:
- Metrics tell us about measurements
- Logs tell us about granular details
- Traces tell us about how things connect across services
But still, little of the discussion was about how to use these things to drive reliability and business outcomes. Instead, we were focused on what these pillars actually were and the best way to to get them. Those were good and fine discussions, they didn’t ultimately get us to where we wanted to be.
Then, some folks started saying that the three pillars weren’t the right way to look at observability, saying we should break away from that limited thinking. And when they did so, they typically advocate for more pillars, suggesting things like events, exceptions, and profiles, just to name a few.
While the recommended additional pillars all sound great, we are still talking about types of data – and having a bunch of different types of data does not equal having observability.
The Present of Observability
Today, we are evolving, and while we aren’t totally aligned on the definition of observability, we are moving away from a focus on types of data, and moving to a focus on how we can use observability to drive improved reliability – and ultimately business outcomes.
So to add my definition of observability into the discussion, I believe Observability plays three different roles in how we build and deliver software. In other words, observability exists to:
- Define the health of a system
- Help understand and resolve issues
- Power our efforts to learn and improve
When we start to look at Observability through that lens and what it can do for us, only then can we start to make sense of how to use all the various different types of data. (Which sort of makes me think…. Maybe those should be our pillars?)
The Bleeding Edge: Full Stack Observability and Cloud Dependencies
For observability to reach its full potential, it is critical to have what I call “full stack observability” which means having the ability to observe every layer of the stack that delivers our products. This includes the 40 – 50 external cloud dependencies that the average app relies on, anywhere from a dozen AWS services to payment APIs and authentication providers.
In fact, cloud dependencies are the source of up to 70% of SaaS outages. So until we comprehensively include direct visibility of cloud dependencies into our observability strategy, we’re missing out on the majority of the data we need to keep our software reliable.
Now, I don’t want to be an alarmist, things are not as bad as they may seem. First, not all of our cloud dependencies are in the critical path, for many, if they go down, you can remain up. And second, in reality, some of your cloud dependencies will deliver better than what their SLA promises.
But I firmly believe that increasing our understanding and visibility into one of the top causes of outages is our best way to improve our incident response. It can help us understand what the risks are and where we are vulnerable instead of being blind to impact from dependencies.
Further, it can help us respond to issues with full context about everything happening in the stack, rather than only having information about the systems we own. And it can help us hold our cloud vendors more accountable because we aren’t going to stop building on them, but we can make sure we are working with providers that support our SLAs, rather than standing in the way of them.
The Future of Observability
We’ve talked about the past and present of observability, including where the biggest opportunities are today. I want to leave you with some predictions for what could come next.
I want to see a future where:
- We have the same level of visibility into our third-party cloud dependencies as we have into our own software today.
- Cloud apps from different companies support distributed tracing across boundaries in order to get visibility into the issues and bottlenecks in our modern systems that rely so much on third-parties.
- Cloud vendors and their customers share data with each other, helping each other be better, rather than holding their metrics, logs, and traces close to their hearts
- We can use this shared data to be able to predict outages before they impact us and where we can use this shared visibility to feed insights back into our systems, using automation to mitigate the impact of outages or avoid them altogether.
The observability of our past does not have to be the observability of our future. And thankfully, it is evolving today and it always has. In the beginning, we focused on the shape data could take, but today we are looking for opportunities to expand our visibility, and we are asking ourselves how we can use the information we have.
Our systems are complex, and increasingly, they are dependent on cloud services that we don’t control. While many of us don’t have visibility into these third-party cloud dependencies, it can be done and it may be the best opportunity we have to drive better outcomes.
If you have questions about how to improve observability – including how to achieve full-stack visibility, contact us at Metrist. We’d be happy to answer questions – and set you up with a free Metrist account so you can monitor your cloud dependencies in real time.