Troubleshooting in Zabbix
Sometimes Zabbix can be a real pain to troubleshoot if you don't know where to look. Here are some pointers:
Since Zabbix 2.4 you will now see why your item has failed when you hover your mouse over the red box. This can help you a lot.
Don't forget that Zabbix logs everything for agent proxy and server under the
/var/log/zabbix/
file. If something fails, this is probably the second place to look at.SELinux can mess with your installation too. For example, ping can be blocked by SELinux returning the wrong value as if your host is not reachable. Don't forget that since RHEL 6.5, there are SELinux policies for the agent that can be set as follows:
setsebool -P zabbix_can_network on
Even if you have set the Boolean, there can be other issues with SELinux. To investigate this you could run:
sealert -a /var/log/audit/audit.log
SELinux will tell you what it has blocked and why, and it will also try to tell you how to undo this. Most of the time this will work however it's not perfect and sometimes you have to investigate further. To make the sealert
parameter working you probably have to install the setroubleshoot
package.
For example, creating your own fping
module could be done like this:
grep fping /var/log/audit/audit.log | audit2allow -M zabbix_fping semodule -i zabbix_fping.pp
Yet another way to solve your problems is by increasing the debug level of the
log
file. However, be careful as increasing thelog
level will give you a lot of information. Since Zabbix 2.4, it is possible to do this without restarting the Zabbix server:zabbix_server -c /etc/zabbix/zabbix_server.conf -R log_level_increase
This will increase the log level for all services. Same can be done for only the Apache service:
zabbix_server -c /etc/zabbix/zabbix_server.conf -R log_level_increase="http poller"
With the
decrease
option instead of theincrease
option you can return back to log level 3.When troubleshooting communication issues between client and server, remember that we have two types of clients. Your client can either be active or passive. In case of an active client, your client needs to be able to connect to port 10051 on the server. In case of a passive client, the server should connect to the client on port 10050. Make sure that both ports can be reached from the client and server. You could use Telnet to test this, for example,
telnet <ip> <port>
.If you are running an older version of Zabbix, it could be wise to upgrade. Many bugs are fixed in the latest versions and as we mentioned before, you could gain major speed improvements.
If you have issues with Zabbix, and you think you have hit a bug, you could have a look at the support page, https://support.zabbix.com. Also, feature requests can be made here if you think Zabbix is missing some important feature. If your company is missing an important feature, they can also sponsor this feature by paying Zabbix for the development. (Remember your company saves lots of money by making use of Zabbix that comes for free. This way they could give back to the community and help Zabbix pay for the development.)
Make sure you make use of Network Time Protocol (NTP) servers for your Zabbix server and proxies as it can give issues if you run without them. You can identify the issues by looking at your Zabbix queue. It will show you that data is missing for 5, 10 minutes, or more.
If you encounter a problem when working with SNMP, make sure that your device supports the bulk feature. Some devices don't follow standards well, so the solution in that case could be not to make use of the SNMP bulk feature.
If monitoring SNMPv3 devices, make sure that
msgAuthoritativeEngineID
(also known assnmpEngineID
or "Engine ID
") is never shared by two devices as this will give rise to problems.