Setting up Stack Monitoring
In this recipe, we will explore the details of setting up Stack Monitoring within the Elastic Stack. Monitoring is a crucial component that provides insights into the health, performance, and availability of your Elastic Stack deployment, including Elasticsearch, Kibana, Integrations Server, Elastic Agent, Beats, and Logstash.
We’ll start by guiding you through the initial setup steps, including configuring your Elastic Stack deployments for monitoring. This involves enabling monitoring features and specifying the collection of metrics and logs that will give you visibility into your stack’s operation. Next, we’ll delve into the use of Kibana for visualizing monitoring data. Kibana offers a dedicated monitoring UI where you can view metrics and logs, analyze the health of your nodes and indices, and track performance issues across your deployment.
Additionally, we’ll cover advanced topics, such as setting up alerting rules for automated notifications about potential issues within your stack. This proactive approach ensures that you can respond swiftly to any anomalies before they impact your operations.
Getting ready
Make sure to have the following requirements met:
- Terraform installed on your machine; if that is not the case, we’ll advise you to follow the steps outlined in the Terraform official documentation in order to have everything up and running.
- An Elastic Cloud deployment up and running. To quickly spin up a deployment, please refer to the Configuring Elastic Stack components with Terraform recipe in Chapter 12.
- As we’re going to provision a deployment on Elastic Cloud, we will need an API key that’ll be passed to Terraform. In order to get such a key, follow the steps described in the official Elastic documentation here: https://www.elastic.co/guide/en/cloud/current/ec-api-authentication.html. We will refer to this key as
api_key
later in the recipe.
The snippets for this recipe can be found here: https://github.com/PacktPublishing/Elastic-Stack-8.x-Cookbook/blob/main/Chapter13/snippets.md#setting-up-stack-monitoring
Important note
While the setup for this guide follows the recommended best practice of having a dedicated monitoring deployment, you can also opt for self-monitoring by using your main deployment to collect monitoring data.
How to do it…
The objective of this recipe is to set up a monitoring cluster beside our main deployment. We will use Terraform to deploy the additional monitoring cluster and set up the routing of logs and metrics from our main cluster to the monitoring cluster.
Figure 13.1 shows a simplified architecture of the setup:
Figure 13.1 – Monitoring setup overview
Note on monitoring in Elastic Cloud
For a deployment in Elastic Cloud to send logs and metrics to a dedicated monitoring cluster, both the main deployment and the monitoring cluster must be situated in the same cloud region. As of the current writing, cross-region monitoring is not supported. This requirement ensures that data transfer remains efficient and reduces latency, but it also means planning your deployment strategy to accommodate these limitations if monitoring across regions is a necessity for your operations.
- Download the following Terraform configuration files from the book’s official repository: https://github.com/PacktPublishing/Elastic-Stack-8.x-Cookbook/blob/main/Chapter13/terraform/monitoring.
- Next, open the
.tfvar
file in your preferred editor, add yourapi_key
key, and then save the file. - As we did in the Configuring Elastic Stack components with Terraform recipe in Chapter 12, run the following commands to get a sense of what will be deployed in terms of components:
$ terraform init $ terraform plan
- Next, execute the configuration with the following command:
$ terraform apply
Upon successful completion of the command, head to the Elastic Cloud console and look for the deployment named
terraform-monitoring
. - Having established the deployment that will act as a repository for monitoring data, we must now configure our main deployments to forward logs and monitoring metrics. Navigate to the cloud console and select the deployment you wish to monitor. Then, find the Logs and metrics option in the left menu and click on it. On the Logs and metrics page, click Enable. From the drop-down menu, choose the
terraform-monitoring
deployment and click Save to apply and save the configuration changes:
Figure 13.2 – Logs and metrics: Ship to a deployment setup
Once the configuration has been applied, the Logs and metrics page will look as pictured in Figure 13.3. Repeat the same operation for the new-team
deployment as well:
Figure 13.3 – Ship to a deployment activated
Now that we have successfully configured the shipping of logs and metrics to our terraform-monitoring
deployment, let’s explore how to utilize this data within the stack monitoring application.
- Head to Kibana for the
terraform-monitoring
deployment and go to Stack Monitoring under the Management section:
Figure 13.4 – Stack Monitoring app in the Kibana menu
- Upon arriving at the Stack Monitoring page, you may encounter a popup about the creation of out-of-the-box rules. We will address this later, so for now, please click on Remind me later. You will then be presented with a list of clusters being monitored:
Figure 13.5 – Stack Monitoring cluster listing
- Click on the
main-deployment
cluster in the list to go to the monitoring overview page:
Figure 13.6 – Stack Monitoring cluster overview
- We will start with the Elasticsearch overview| Overview page. From here, you can zoom in on key aspects of the health and status of your clusters:
Figure 13.7 – Stack Monitoring Elasticsearch overview
The Elasticsearch overview page is divided into two sections:
- The upper half of Figure 13.7 focuses on key search and indexing metrics. Leverage those dashboards to quickly spot performance issues in your deployment.
- The lower half of Figure 13.7 focuses on recent log entries. Here, you can catch any error or suspicious activity from the nodes.
You will spend most of your time in the Nodes and Indices tabs. The former is incredibly useful to have a good understanding of how resources are used and allocated on each node of your deployment and quickly pinpoint issues such as node hot spotting, over usage of CPU, and disk saturation, as illustrated in Figure 13.8:
Figure 13.8 – Node monitoring overview
Nodes preceded by a star designate the master node. You can click on each node to get more detailed information and metrics for specific nodes, as shown in Figure 13.9:
Figure 13.9 – Instance monitoring details
Those critical metrics can help you answer the following questions:
- Is my cluster or a particular node under heavy resource usage?
- What is the bottleneck in terms of CPU, I/O, or JVM?
- How does the latency experienced by my users correlate with the resource’s usage?
The Indices tab is more focused on the data store side of things. You will find useful indicators such as the document count, index and search rates, and any unassigned shards that might make your cluster unhealthy:
Figure 13.10 – Stack Monitoring Indices tab overview
- By clicking on the name of an index in the table, you can zoom in to view metrics specific to that index. You will have the same metrics but for that specific index. You can use the information available here to troubleshoot a specific index experiencing slow indexing or search issues.
- The Ingest Pipelines tab is a quite recent addition to the Stack Monitoring application, and it focuses on giving some great insights into the throughput and performance of overall ingest processing. It is especially valuable if you have a lot of ingest pipelines running in your cluster:
Figure 13.11 – Ingest Pipelines monitoring overview
Note on Ingest Pipelines monitoring
The first time you try to access the Ingest Pipelines tab, you will be prompted to install the Elasticsearch integration. Click on the Install button in the popup window to proceed with the installation.
The last two tabs are dedicated to specific features of the stack:
- Machine learning jobs gives some high-level information on machine learning (ML) nodes and key metrics on jobs, such as their state, processed records, the node on which they’re running, and model size
- Cross-Cluster Replication (CCR) monitors and manages the health, performance, and status of cross-cluster replication operations
Now, while those dashboards and visual information on the clusters are unbelievably valuable, the great benefit of having Stack Monitoring enabled is the ability to receive alerts if your deployment is showing signs of degradation. Stack Monitoring comes with a set of prebuilt rules, designed by the experts of the Elastic Stack based on the recommended best practices.
- To activate the rules, on the top right of the overview page, locate the Alerts and rules dropdown, and once opened, choose the Create default rules option:
Figure 13.12 – Activating default rules for Stack Monitoring
- If you are presented with a popup regarding the migration of watches, click on Create and wait for a few seconds for the rules to be activated. Once that is the case, you will notice alerts if you have any.
- It is also worth mentioning that you can edit the provided rules if you wish, but keep in mind that those rules are based on best practices and recommendations from the stack experts. To do so, locate the Enter setup mode button on the top right of the monitoring page. Click on it and select the rules you wish to adjust. Editing the rules is also the best approach to add actions and get notifications by leveraging connectors such as email, Slack, or more:
Figure 13.13 – Entering setup mode for stack monitoring
- While we have been focusing on Elasticsearch, notice that Kibana and Integration Servers monitoring data is also available. Go back to the cluster overview page and click on Overview under the Kibana section. As pictured in the following figure, this dashboard shows client (user) activity metrics (requests and responses) as well as some important data on queue usage for all your Kibana instances:
Figure 13.14 – Stack Monitoring: Kibana overview
The Kibana Instances view gives some metrics about client (user) activities on Kibana, plus details on HTTP connections and memory size broken down by Kibana instances.
- The Integration Servers section also focuses on resource usage and key metrics related to component-specific ingest activities. You will find information such as event rates, and HTTP requests and responses emitted by the server.
The advantage of monitoring the Elastic Stack using Elastic’s own tools is that monitoring data is stored as regular indices, which can be utilized for personalized analysis.
As demonstrated in the Building custom visualizations for monitoring data recipe, this approach enables you to delve into various aspects of your clusters’ operations and usage. For instance, you can analyze the volume of data ingested daily and the quantity of data queried or identify the most common time ranges used in queries. All these insights can be gained by simply leveraging the monitoring data available to you.
How it works…
To understand how Stack Monitoring operates, we’ve broken down its inner workings into the following sections.
Data collection
At the core of Stack Monitoring is the collection of metrics and logs from various components of the Elastic Stack. This is facilitated by Metricbeat and Filebeat, which are configured to collect detailed operational data. Metricbeat gathers metrics from each component, such as CPU usage, memory usage, and node health, whereas Filebeat collects logs that provide insights into operational events. These Beats are designed to work seamlessly with Elastic Cloud, ensuring data is efficiently captured and transmitted for analysis. You can also use Elastic Agent since 8.5 to collect stack monitoring events: https://www.elastic.co/guide/en/elasticsearch/reference/current/configuring-elastic-agent.html.
Shipping and storage
Once collected, the data is shipped to the dedicated monitoring cluster within Elastic Cloud. This separation of the monitoring data from the production data ensures that monitoring activities do not impact the performance of the production environment. The data is stored in indices, just as with any other data within Elasticsearch, making it accessible for analysis and visualization. When running on-premise, it is also considered a best practice to isolate the monitoring cluster.
Analysis and visualization
The analysis and visualization of collected data are primarily conducted through Kibana, which offers a specialized Stack Monitoring UI. This UI presents users with a comprehensive dashboard that visualizes the health, performance, and logs of Elastic Stack components. Users can drill down into specific metrics, view historical trends, and identify patterns or anomalies that may indicate issues or opportunities for optimization.
Alerting
Elasticsearch’s alerting framework is an integral part of Stack Monitoring, allowing users to define alerts based on specific conditions within the monitoring data. These alerts can notify users of potential issues, such as a sudden drop in performance or a node going offline, enabling rapid response to ensure the stability and reliability of the Elastic Stack.
There’s more…
Security in Stack Monitoring is tightly integrated with Elastic Cloud’s overall security model. Access to monitoring data and functionalities is controlled through role-based access control (RBAC), ensuring that only authorized users can view or manipulate monitoring configurations and data.
For optimal monitoring, it is recommended to do the following:
- Regularly review the health and performance dashboards to stay ahead of potential issues
- Configure alerts to proactively manage the Elastic Stack environment
- Utilize detailed metrics and logs to perform root cause analysis (RCA) of any operational issues
Logstash can also be monitored through the Stack Monitoring application. To do so, you can leverage the Logstash integrations for Elastic Agents. By deploying an elastic agent on the infrastructure where Logstash is running, you will be able to collect monitoring data and ship it to your monitoring cluster. Check out the official Elastic documentation for more information on this feature: https://www.elastic.co/guide/en/logstash/current/monitoring-with-elastic-agent.html.
In this recipe, we have manually set up the shipping of monitoring data through the cloud console; if you are using Terraform, you can quickly set up a monitoring deployment in your configuration with the Observability settings. Have a look at the documentation if you are interested in knowing more: https://registry.terraform.io/providers/elastic/ec/latest/docs/resources/deployment#with-observability.
See also
- For a complete list of Stack monitoring features from the official Elastic documentation, check this address: https://www.elastic.co/guide/en/kibana/current/xpack-monitoring.html
- If you are looking for guidelines to configure Stack Monitoring on a self-managed Elasticsearch setup, check the following documentation: https://www.elastic.co/guide/en/elasticsearch/reference/current/monitor-elasticsearch-cluster.html