In this article written by Luciano Alves, author of the book Zabbix Performance Tuning, the author explains that ever since he started working with IT infrastructure, he's been noticing that almost every company, when they start thinking about a monitoring tool, think of trying to know in some way when the system or service will go down before it actually happens. They expect the monitoring tool to create some kind of alert when something is broken. But by this approach, the system administrator will know about an error or system outage only after the error occurs (and maybe, at the same time, users are trying to use those systems).
We need a monitoring solution to help us predict system outages and any other situation that our services can be affected by. Our approach with monitoring tools should cover not only our system monitoring but also our business monitoring.
Nowadays, any company (small, medium, or large) has some dependency on technologies, from servers and network assets to IP equipment with a lower environmental impact. Maybe you need security cameras, thermometers, UPS, access control devices, or any other IP device by which you can gather some useful data. What about applications and services? What about data integration or transactions? What about user experience? What about a supplier website or system that you depend on?
We should realize that monitoring things is not restricted to IT infrastructure, and it can be extended to other areas and business levels as well.
(For more resources related to this topic, see here.)
Suppose you already have your Zabbix server up and running. In a few weeks, Zabbix has helped you save a lot of time while restoring systems. It has also helped you notice some hidden things in your environment—maybe a flapping port in a network switch, or lack of CPU in a router.
In a few months, Zabbix and you (of course) are like superstars. During lunch, people are talking about you. Some are happy because you've dealt with a recurring error. Maybe, a manager asks you to find a way to monitor a printer because it's very important to their team, another manager asks you to monitor an application, and so on.
The other teams and areas also need some kind of monitoring. They have other things to monitor, not only IT things. But are these people familiar with technical things? Technical words, expressions, flows, and lines of thoughts are not so easy for people with nontechnical backgrounds to understand.
Of course, in small and medium enterprises (SME), things will go ahead faster and paths will be shorter, but the scenario is not too different in most cases. You can work alone or in a huge team, but now you have another important partner—Zabbix.
An immutable fact is that monitoring things comes with more and more responsibility and reliability. At this point, we have some new issues to solve:
When Zabbix's visibility starts growing in your environment, you will need to think how to manage and handle these users. Do you have an LDAP or Microsoft Active Directory that you can use for centralized authentication? Of course, depending on the users you have, you will have more requests. Will you permit any user to access the Zabbix interface? Only a few? And which ones?
We know that Zabbix has a lot of built-in keys for gathering data. These keys are available for a good number of operating systems. We also have built-in functions used to gather data using the Intelligent Platform Management Interface (IPMI), Simple Network Management Protocol (SNMP), Open Database Connectivity (ODBC), Java Management Extensions (JMX), user parameters in the Zabbix agent, and so on. However, we need to think about a wide scenario where we need to gather data from somewhere Zabbix hasn't reached yet.
Our experience shows us that most of the time, it is necessary to create custom monitors (not one, but a lot of them). Zabbix is a very flexible and easy-to-customize platform. It is possible to make Zabbix do anything you want. However, to learn every new function or to monitor Zabbix, you'll need to think about what kind of extension you'll use.
This means that when other teams or areas start putting light on Zabbix, you will need to think about the number of new functions or monitors you will need to get. Then, which language to choose to develop these new things? Maybe you know the C language and you are thinking of using Zabbix modules. Will you use bulk operations to avoid network traffic?
In most scenarios, natural growth will occur without control. I mean, people are not used to planning this growth. It is very important to keep it under control.
When some guys start their Zabbix deployment, they probably do not intend to cater to all company teams, areas, or businesses. They think about their needs and their team only. So, they don't think a lot about user rights, mainly because they are technicians and know mostly about hosts, items, triggers, maps, graphs, screens, and so on. What about users who are not technicians? Will they understand the Zabbix interface easily? Do you know that in Zabbix, we have a lot of paths that reach the same point?
The Zabbix interface isn't object-based, which means that users need a lot of clicks to reach (read or write) the information related to an object (hosts, items, graphs, triggers, events, and so on).
If you need to see the most recent data gathered from a specific item, you'll need to use the Monitoring menu, then use the Latest data menu, choose the group that the host belongs to, choose your host, and finally search for your item in the table.
If you need to see a specific custom graph, use the Graphs menu, which is under Monitoring. Choose the group that the hosts belong to, choose your host, and then search for your graph in a combobox.
If you need to know about an active trigger in your host, you'll need to use the Triggers menu, which is under Monitoring. Choose the group that your host belongs to and choose your host. Then, you can see the triggers from that specific host.
If you want to include a new item in an existing custom graph, you'll need to access the Hosts menu, which is under Configuration. Choose the group that the hosts belong to, search for your host, and click on the Graphs link. Then you can choose which graph you want to change.
There are a lot of clicks required to do simple things. Of course, the steps you just saw are something familiar for guys who have deployed Zabbix, but is this true for other teams too?
Maybe, you are thinking right now that it doesn't matter to those guys. But actually, it matters, and it's directly related to Zabbix's growth in your environment. Okay, I think the next two questions will be: are you sure it matters? And why?
Let's agree that the actual Zabbix interface isn't very user friendly for nontechnical guys. But according to the path of natural growth, you started gathering data from a lot of things that are not just IT related. Also, you can develop custom charts and any data from Zabbix via API functions. Now you'll have a lot of nontechnical guys trying to use Zabbix data. I'm sure that it will be necessary to create some maps and screens to help these users get the required information quickly and smoothly.
The following screenshots show how we can transform the viewing layer of Zabbix into something more attractive:
Tactical dashboard
Here is what a strategic dashboard may look like:
Strategic dashboard
The point here is whether your Zabbix deployment is prepared to cater to these types of requirements.
We've noticed how Zabbix has evolved in terms of performance issues with each version. Also, you realized the importance of the need to be aware of its new features.
Another significant point was to realize that the importance of Zabbix is growing, as the other teams and areas of the company are now aware of the potential of this tool. This movement will take Zabbix to all the corners of a company, which often requires a more open approach as far as monitoring tasks is concerned. Monitoring only servers and network assets will not suffice.
Further resources on this subject: