There has been a lot of projects, that I have participated in , that have had different degrees of difficulties.
This project was a bit of a challenge in fact, as I needed to think a bit out of the box and not have unrelated or irrelevant items or triggers.
Elasticsearch monitored by Zabbix
Yes, there is an Out of the box Template in Zabbix, and if you have a small simple cluster or a simple install Elasticsearch one note install, then the Out of the box Template is fine.
But when we move out on another territory of multiple masters and multiple roles, we get into a total different game, like multiple alerts for one error and a lot of double data collection.
So this project was one of the first assignment I had at CapMon ( You can contact them for the template, just mention I Morten Moesgaard send you via my website.).
They had and issue that the Out of the box Zabbix Template was to noisy, so they needed a more fine grained template, so they let my brain go to work.
Well the way my brain works is KISS and Automation is the key, so what did my brain come up with?
Well first of the most important part of the cluster is …. The Masters of the cluster, so what do we do ? well we first get a list of potential masters, comma separated as it shout be for readability, then you create an item in Zabbix of the type “Script”.
This item will loop through the list of masters and by first http request with 200 response, returns the current master of the cluster, which is used in another item, a depend Item and this Item is the magic sauce of it all, it gathers information about all the hosts/nodes present in the in the cluster.
Next step is a bit more challenging, splitting the hosts out and create them via LLD.
The biggest challenge here is to avoid double host creation, so we split up the LLD, where we create host/node type-detection.
So standalone or cross clusters masters get created, Data host standalone, Ingest Standalone, Mixed host ( meaning above types are all in one ).
With this filtering they can get job specific Template or an all in one, all so another thing we do with masters, is we instruct their alarms/problems related to master specific functions, only to trigger if the host is a Master, as an example you don’t have 3 hosts reporting the same issue, only the host that has the issue.
The template can be added to a synthetic host and when the things are setup.
An API key for Elasticsearch and a list of the masters, within 1-2 mins your cluster monitoring is up and running.
Another trick up my sleeve, was the synthetic host creates problem/alarms that are Cluster relevant, like cluster health and all over stats, like shards and replica health, in an all over state.
And then you can make the hosts them self report on what is the error on the specific one, like a hosts is down, so shards are allocating on another host, so this will give you a quick overlook of why the cluster is in a Yellow state.
This is just a description of the template, i can promise this wasn’t done in an afternoon, some of it was trial and error.
But again you can always contact Capmon A/S if this is something, you think you could use, I cant tell you a price, all i can say is, my brain and help from my Elasticsearch specialized colleagues, made this.