You're reading from Learning Kibana 5.0 Exploit the visualization capabilities of Kibana and build powerful interactive dashboards

Product type Paperback

Published in Feb 2017

Publisher Packt

ISBN-13 9781786463005

Length 284 pages

Edition 1st Edition

Languages

Java

Tools

Elasticsearch

Concepts

Data Visualization

Author (1):

Bahaaldine Azarmi

View More author details

Table of Contents (10) Chapters

Preface

1. Introduction to Data-Driven Architecture FREE CHAPTER

2. Installing and Setting Up Kibana 5.0

3. Business Analytics with Kibana 5.0

4. Logging Analytics with Kibana 5.0

5. Metric Analytics with Metricbeat and Kibana 5.0

6. Graph Exploration in Kibana

7. Customizing Kibana 5.0 Timelion

8. Anomaly Detection in Kibana 5.0

9. Creating a Custom Plugin for Kibana 5.0

Overview of the Elastic stack

The Elastic stack, formerly called ELK, provides the different layers that are needed to implement a data-driven architecture.

It starts from the ingestion layer with Beats, Logstash, and the ES-Hadoop connector, to a distributed data store with Elasticsearch, and finally to visualization with Kibana, as shown in the following figure:

Elastic stack structure

As we can see in the diagram, Kibana is just one component of it.

In the following chapters, we'll focus in detail on how to use Kibana in different contexts, but we'll always need the other components. That's why you will need to understand the roles of each of them in this chapter.

One other important thing is that this book intends to describe how to use Kibana 5.0; thus, in this book, we'll use the Elastic stack 5.0.0 (https://www.elastic.co/blog/elastic-stack-5-0-0-released).

Elasticsearch

Elasticsearch is a distributed and scalable data store from which Kibana will pull out all the aggregation results that are used in the visualization. It's resilient by nature and is designed to scale out, which means that nodes can be added to an Elasticsearch cluster depending on the needs in a very simple way.

Elasticsearch is a highly available technology, which means that:

First, data is replicated across the cluster so in case of failure there is still at least one copy of the data
Secondly, thanks to its distributed nature, Elasticsearch can spread the indexing and searching load over the cluster nodes, to ensure service continuity and respect to your SLAs

It can deal with structured and unstructured data, and as we visualize data in Kibana, you will notice that data, or documents to use Elastic vocabulary, are indexed in the form of JSON documents. JSON makes it very handy to deal with complex data structures as it supports nested documents, arrays, and so on.

Elasticsearch is a developer-friendly solution and offers a large set of REST APIs to interact with the data, or the settings of the cluster itself. The documentation for these APIs can be found at https://www.elastic.co/guide/en/elasticsearch/reference/current/docs.html.

The parts that will be interesting for this book are mainly aggregations and graphs, which respectively will be used to make analytics on top of the indexed data (https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations.html) and create relations between documents (https://www.elastic.co/guide/en/graph/current/graph-api-rest.html).

On top of these APIs, there are also client APIs which allow Elasticsearch to be integrated with most technologies such as Java, Python, Go, and so on (https://www.elastic.co/guide/en/elasticsearch/client/index.html.).

Kibana generates the requests made to the cluster for each visualization. We'll see in this book how to dig into it and what features and APIs have been used.

The last main aspect for Elasticsearch is that it's a real-time technology that allows working with all ranges of volumes from gigabytes to petabytes, with the different APIs.

Besides Kibana, there are lot of different solutions that can leverage the open APIs that Elasticsearch offers to build visualization on top of the data; but Kibana is the only technology dedicated to it.

Beats

Beat is a lightweight data shipper which transports data from different sources such as applications, machines, or networks. Beats is built on top of libbeat, an open source library that allows every flavor of beat to send data to Elasticsearch, as illustrated in the following diagram:

Beats architecture

The diagram shows the following Beats:

Packetbeat, which essentially sniffs packets over the network wire for specific protocols such as MySQL and HTTP. It basically grabs all the fundamental metrics that will be used to monitor the protocol in question. For example, in the case of HTTP, it will get the request, the response, wrap into a document, and index it into Elasticsearch. We'll not use this beat in the book, as it would require a full book on it, so I encourage you to go on the following website to see what kind of Kibana dashboard you can build on top of it: http://demo.elastic.co.
Filebeat is meant to securely transport the content of a file from point A to point B like the tail command. We'll use this beat jointly with the new ingest node (https://www.elastic.co/guide/en/elasticsearch/reference/master/ingest.html) to push data from file directly to Elasticsearch, which will process the data before indexing it. The architecture can then be simplified, as shown in the following figure:

Ingestion pipeline without ingest
In the preceding diagram, the data is first shipped by Beats, then put into a message broker (we'll come back to this notion later in the book), processed by Logstash, before being indexed by Elasticsearch. The ingest node dramatically simplifies the architecture for the use case:

Ingestion pipeline with Ingest node

As the preceding diagrams show, the architecture is reduced to two components with filebeat and the ingest node. We'll be then able to visualize the content in Kibana.

Topbeat is the first kind of Metricbeat that allows us to ship machines or application execution metrics to Elasticsearch. We'll also use it later on in this book to ship our laptop data and visualize it in Kibana. The good thing here is that this Beat comes with pre-built templates which only need to be imported in Kibana, as the document generated by the Beat is standard.
There are a lot of different Beats made by the community that can be used for interesting data visualization. A list of them can be found at https://www.elastic.co/guide/en/beats/libbeat/current/index.html.

While Beats offer some basic filtering features, they don't provide the level of transformation that Logstash can bring.

Logstash

Logstash is a data processor that embraces the centralized data processing paradigm. It allows the users to collect, enrich/transform, and transport data to destinations with the help of more than 200 plugins, as shown in the following figure:

Logstash, the processing pipeline

Logstash is capable of collecting data from any source including Beats as every Beat comes with an out-of-the box integration for Logstash. The separation of roles is clear here: while beats is responsible for shipping the data, Logstash allows for processing the data before indexation.

From a data visualization point of view, Logstash should be used in order to prepare the data; we will see, for example, later in this book that you could receive IP addresses in logs from which it might be useful to deduce a geo-location. This can be done with the new geoip plugin, which is available at: https://www.elastic.co/guide/en/logstash/current/plugins-filters-geoip.html. This helps to get the following kind of visualization:

IP address visualization on a map

We'll see in our use cases how data preparation is important to comply with the different available visualizations in Kibana.

Kibana

Kibana is the core product described in this book; it's where all the user interface actions happen. Most of the visualization technology handles the analytical processing, whereas Kibana is just a web application that renders analytical processing done by Elasticsearch. It doesn't load data from Elasticsearch and then process it, but leverages the power of Elasticsearch to do all the heavy lifting. This basically allows real-time visualization at scale: as the data grows, the Elasticsearch cluster is scaled relatively to offer the best latency depending on the SLAs.

Kibana provides the visual power to Elasticsearch aggregations, allowing you to slice through your time-series datasets, or segment your data fields, as easy as pie.

Kibana is fitted for time-based visualization, even if your data can come without any timestamp, and brings visualization made for rendering the Elasticsearch aggregation framework. The following screenshot shows an example of a dashboard built in Kibana:

Kibana dashboard

As you can see, a dashboard contains one or more visualizations. We'll dig into them one by one in the context of our use cases. To build a dashboard, the user is brought into a data exploration experience where they will:

Discover its data by digging into the indexed document as the following screenshot shows:

Discover data view
Build visualizations with the help of a comprehensive palette, based on the question the user has for the data:

Kibana pie chart visualization
The preceding visualization shows the vehicles involved in an accident in Paris. In the example, the first vehicle was a Car, the second a Motor Scooter, and the last one a Van. We will dig into the accidentology dataset in the logging use case.
Build an analytics experience by composing the different visualizations in a dashboard.
The plugin structure of Kibana makes it infinitely extensible. You'll see that Kibana is not only made for analytics on top of your data, but also to monitor your Elastic stack, build relations between documents, and also to do metrics analytics:

Kibana 5 plugin picker

X-Pack

Lastly I would like to introduce the concept of X-Pack, which will also be used in the book. While X-Pack is part of the subscription offer, one can download it on the Elastic website and use a trial license to evaluate it.

X-Pack is a set of plugins for Elasticsearch and Kibana that provides the following enterprise features.

Security

Security helps to secure the architecture at the data and access level. On the access side, the Elasticsearch cluster can be integrated with an LDAP, Active Directory, and PKI to enable role-based access on the cluster. There are additional ways to access it, either by what we call a native realm (https://www.elastic.co/guide/en/shield/current/native-realm.html), local to the cluster, or a custom realm (https://www.elastic.co/guide/en/shield/current/custom-realms.html), to integrate with other sources of authentication.

By adding role-based access to the cluster, users will only see data that they are allowed to see at the index level, document level, and field level.

From a data visualization point of view, this means, for example, that if a set of users are sharing data within the same index, but are for the first set only allowed to see French data, and for the other group only to see German data, they could both have a Kibana instance pointing to the index, but which, with the help of the underlying permissions configuration, renders the respective country data.

On the data side, the transport layer between Elasticsearch nodes can be encrypted. Transport can also be secured at the Elasticsearch and Kibana level, which means that the Kibana URL can be behind HTTPS.

Lastly, the security plugin provides IP filtering, but more importantly for data visualization, audit logging that tracks all the access to the cluster and can be easily rendered as a Kibana dashboard.

Monitoring

Monitoring is a Kibana plugin that gives insights on the infrastructure. While this was made primarily for Elasticsearch, Elastic is extending it for other parts of the architecture, such as Kibana or Logstash. That way, the users will have a single point of monitoring of all Elastic components and can track, for example, whether Kibana is executing properly, as shown in the following screenshot:

Kibana monitoring plugin

As you can see, users are able to see how many concurrent connections are made on Kibana, as well as deeper metrics such as the Event Loop Delay, which essentially represents the performance of the Kibana instance.

Alerting

If alerting is combined with monitoring data, it then enables proactive monitoring of both your Elasticinfra and your data. The alerting framework lets you describe queries to schedule and action in the background to define:

When you want to run the alert; in other words, to schedule the execution of the alert
What you want to watch by setting a condition that leverages Elasticsearch search, aggregations, and graph APIs
What you want to do when the watch is triggered: write the result in a file, in an index, send it by e-mail, or send it over HTTP

The watch states are also indexed in Elasticsearch, which allows visualization to see the life cycle of the watch. Typically, you would render a graph that monitors certain metrics and the related triggered watches as shown in the following diagram:

Alerts shown on a visualization

The important aspect of the preceding visualization is that the user can see when the alerts have been triggered and how many of them, depending on a threshold.

We will use alerting later in this book in a metric analytics use case based on the performance of the CPU.

Graph

Graph is probably one of the most exciting features of the version 2.3 release at the beginning of 2016. It provides both an API and a plugin for visualization in Kibana, and brings the ability to build relations between documents indexed in Elasticsearch. Unlike what lots of users think, graph is not a graph database; it actually redefines what graph is and seeks for relevant relations between data based on a relevancy ranking, regardless of how the data has been modeled from the start.

Figuring out the frequency of a term in the background dataset is what the search engines naturally understand when they do relevancy ranking; they know how common a word is.

When we throw terms in the Elasticsearch indices, it naturally knows which things are the most interesting, and this logic is applied to the graph.

The easiest way to use graph is to start in Kibana with the help of the graph plugin, and explore the data, as the following figure shows:

Graph visualization in Kibana

In this graph, we can see a country, France, and all the related artists based on the music dataset.

We'll use graph in the related use case later in this book.

Reporting

Reporting is a new plugin brought in with the latest 2.x version to allow users to export the Kibana dashboard as a PDF. This is one of the most expected features from the Kibana users and is as simple as clicking on a button, as illustrated in the following screenshot: