What Is Observability?
Observability is the ability to derive a valid conclusion of what is happening currently to the system and why it is happening.
Guiding Principles for Observability
- Context and sequential flow of each end-tend-end request is most important. We need to be able to see what is having an issue, which other parts might/are affected and what are the commonalities of issues when things go wrong.
- Must be able to cut the data in many ways and correlate the different aspects of a request (e.g. ability to filter for each user, their session, each server node and any of them combined with the other attributes)
- Use questions to drive features required for observability instead of relaying on what we can see.
Observability Components
Components | What is means? |
Metrics | Metrics are numeric values to help evaluate a service's overall behavior over time. They compromise of a set of data points that can be used to derive system's performance. Typical examples are:
|
Events | An event is a collection of data points about what it took to complete a unit of work. they are records of selected significant points that happened with metadata to provide context. Typical examples are:
|
Logs | Logs are important for troubleshooting and trying to understand a problem. they provide detail data and context so one can re-create and diagnose a problem Typical examples are:
|
Traces | Traces are important for showing a step-by-step journey of how a request or action as it moves through the system. these give specific insight into the flow and help one to identify errors, find bottlenecks so they can be optimised and rectified. |
Visualisation | Data needs to be connected in a visual and easy to comprehend approach that allows data to be correlated and derive connections from the different data points and events that is happening in the system. This provides context that are otherwise not easily identifiable by looking at individual metrics alone. |