Observability deals with understanding a system, identifying whether something is wrong with that system, and understanding why it is wrong. But what do we mean by understanding a system? The simple answer would be knowing the state of a single application or infrastructure component.
In this section, we will introduce the user personas that we will use throughout this book. These personas will help to distinguish the different types of questions that people use observability systems to answer.
Let’s take a quick look at the user personas that will be used throughout the book as examples, and their roles:
|
Name and role
|
Description
|
|
Diego Developer
|
Frontend, backend, full stack, and so on
|
|
Ophelia Operator
|
SRE, DevOps, DevSecOps, customer success, and so on
|
|
Steven Service
|
Service manager and other tasks
|
|
Pelé Product
|
Product manager, product owner, and so on
|
|
Masha Manager
|
Manager, senior leadership, and so on
|
Table 1.1 – User persona introductions
Now let’s look at each of these users in greater detail.
Diego Developer
Diego Developer works on many types of systems, from frontend applications that customers directly interact with, to backend systems that let his organization store data in ways that delight its customers. You might even find him working on platforms that other developers use to get their applications integrated, built, delivered, and deployed safely and speedily.
Goals
He writes great software that is well tested and addresses customers’ actual needs.
Interactions
When he is not writing code, he works with Ophelia Operator to address any questions and issues that occur.
Pelé Product works in his team and provides insight into the customer’s needs. They work together closely, taking those needs and turning them into detailed plans on how to deliver software that addresses them.
Steven Service is keen to ensure that the changes Diego makes are not impacting customer commitments. He’s also the one who wakes Diego up if there is an incident that needs attention. The data provided to Masha Manager gives her a breakdown of costs. When Diego is working on developer platforms, he also collects data that helps her get investment from the business into teams that are not performing as expected.
Needs
Diego really needs easy-to-use libraries for the languages he uses to instrument the code he produces. He does not have time to become an expert. He wants to be able to add a few lines of code and get results quickly.
Having a clear standard for acceptable performance measures makes it easy for him to get the right results.
Pain points
When Diego’s systems produce too much data, he finds it difficult to sort signal from noise. He also gets frustrated having to change his code because of an upstream decision to change tooling.
Ophelia Operator
Ophelia Operator works in an operations-focused environment. You might find her in a customer-facing role or as part of a development team as a DevOps engineer. She could be part of a group dedicated to the reliability of an organization’s systems, or she could be working in security or finance to ensure the business runs securely and smoothly.
Goals
Ophelia wants to make sure a product is functioning as expected. She also likes it when she is not woken up early in the morning by an incident.
Interactions
Ophelia will work a lot with Diego Developer; sometimes it’s escalating customer tickets when she doesn’t have the data available to understand the problem; at other times it’s developing runbooks to keep the systems running. Sometimes she will need to give Diego clear information on acceptable performance measures so that her team can make sure systems perform well for customers.
Steven Service works closely with Ophelia. They work together to ensure there are not many incidents, and that they are quickly resolved. Steven makes sure that business data on changes and incidents is tracked, and tweaks processes when things aren’t working.
Pelé Product likes to have data showing the problematic areas of his products.
Needs
Good data is necessary to do the job effectively. Being able to see that a customer has encountered an error can make the difference between resolving a problem straight away or having them wait maybe weeks for a response.
During an incident seeing that a new version of a service was deployed at the time a problem started can change an hours-long incident into a brief blip, and keep customers happy.
Pain points
Getting continuous alerts but not being empowered to fix the underlying issue is a big problem. Ophelia has seen colleagues burn out, and it makes her want to leave the organization when this happens.
Steven Service
Steven Service works in service delivery. He is interested in making sure the organization’s services are delivered smoothly. Jumping in on critical incidents and coordinating actions to get them resolved as quickly as possible is part of the job. So is ensuring that changes are made using processes that help others do it as safely as possible. Steven also works with third parties who provide services that are critical to the running of the organization.
Goals
He wants services to run as smoothly as possible so that the organization can spend more time focused on customers.
Interactions
Diego Developer and Ophelia Operator work a lot with the change management processes created by Steven and the support processes he manages. Having accurate data to hand during change management really helps to make the process as smooth as possible.
Steven works very closely with Masha Manager to make sure she has access to data showing where processes are working smoothly and where they need to spend time improving them.
Needs
He needs to be able to compare the delivery of different products and provide that data to Masha and the business.
During incidents, he needs to be able to get the right people on the call as quickly as possible and keep a record of what happened for the incident post-mortem.
Pain points
Being able to identify the right person to get on a call during an incident is a common problem he faces. Seeing incidents drag on while different systems are compared and who can fix the problem is argued about is also a big concern to him.
Pelé Product
Pelé Product works in the product team. You’ll find him working with customers to understand their needs, keeping product roadmaps in order, and communicating requirements back to developers such as Diego Developer so they can build them. You might also find him understanding and shaping the product backlog for the internal platforms used by developers in the organization.
Goal
Pelé wants to understand customers, give them products that delight them, and keep them coming back.
Interactions
He spends a lot of time working with Diego when they can look at the same information to really understand what customers are doing and how they can help them do it better.
Ophelia Operator and Steven Service help Pelé keep products on track. If too many incidents occur, they ask everyone to refocus on getting stability right. There is no point in providing customers with lots of features on a system that they can’t trust.
Pelé works closely with Masha Manager to ensure the organization has the right skills in the teams that build products. The business depends on her leadership to make sure that these developers have the best tools to help them get their code live in front of customers where it can be used.
Needs
Pelé needs to be able to understand customers’ pain points even when they do not articulate them clearly during user research.
He needs data that gives him a common language with Diego and Ophelia. Sometimes they can get too focused on specific numbers such as shaving off a couple of milliseconds from a request, when improving a poor workflow would improve the customer experience more significantly.
Pain points
Pelé hates not being able to see at a high level what customers are doing. Understanding which bits of an application have the most usage, and which bits are not used at all, lets him know where to focus time and resources.
While customers never tell him they want stability, if it’s not there they will lose trust very quickly and start to look at alternatives.
Masha Manager
Masha works in management. You might find her leading a team and working closely with them daily. She also represents middle management, setting strategy and making tactical choices, and she is involved, to some extent, in senior leadership. Much of her role involves managing budgets and people. If something can make that process easier, then she is usually interested in hearing about it. What Masha does not want to do is waste the organization’s money, because that can directly impact jobs.
Goals
Her primary goals are to keep the organization running smoothly and ensure the budget is balanced.
Interactions
As a leader, Masha needs accurate data and needs to be able to trust the teams who provide that data. The data could be the end-to-end cycle time of feature concept to delivery from Pelé Product, the lead time for changes from Diego Developer, or even the MTTR from Steven Service. Having that data helps her to understand where focus and resources can have the biggest impact.
Masha works regularly with the financial operations staff and needs to make sure they have accurate information on the organization’s expenditure and the value that expenditure provides.
Needs
She needs good data in a place where she can view it and make good decisions. This usually means she consumes information from a business intelligence system. To use such tools effectively, she needs to be clear on what the organization’s goals are, so that the correct data can be collected to help her understand how her teams are tracking to that goal.
She also needs to know that the teams she is responsible for have the correct data and tools to excel in their given areas.
Pain points
High failure rates and long recovery time usually result in her having to speak with customers to apologize. Masha really hates these calls!
Poor visibility of cloud systems is a particular concern. Masha has too many horror stories of huge overspending caused by a lack of monitoring; she would rather spend that budget on something more useful.
You now know about the customers who use observability data, and the types of data you will be using to meet their needs. As the main focus of this book is on Grafana as the underlying technology, let’s now introduce the tools that make up the Grafana stack.