Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletter Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds
Learning ELK Stack
Learning ELK Stack

Learning ELK Stack: Build mesmerizing visualizations, analytics, and logs from your data using Elasticsearch, Logstash, and Kibana

eBook
$9.99 $39.99
Paperback
$48.99
Subscription
Free Trial
Renews at $19.99p/m

What do you get with a Packt Subscription?

Free for first 7 days. $19.99 p/m after that. Cancel any time!
Product feature icon Unlimited ad-free access to the largest independent learning library in tech. Access this title and thousands more!
Product feature icon 50+ new titles added per month, including many first-to-market concepts and exclusive early access to books as they are being written.
Product feature icon Innovative learning tools, including AI book assistants, code context explainers, and text-to-speech.
Product feature icon Thousands of reference materials covering every tech concept you need to stay up to date.
Subscribe now
View plans & pricing
Table of content icon View table of contents Preview book icon Preview Book

Learning ELK Stack

Chapter 1. Introduction to ELK Stack

This chapter explains the importance of log analysis in today's data-driven world and what are the challenges associated with log analysis. It introduces ELK stack as a complete log analysis solution, and explains what ELK stack is and the role of each of the open source components of the stack, namely, Elasticsearch, Logstash, and Kibana. Also, it briefly explains the key features of each of the components and describes the installation and configuration steps for them.

The need for log analysis

Logs provide us with necessary information on how our system is behaving. However, the content and format of the logs varies among different services or say, among different components of the same system. For example, a scanner may log error messages related to communication with other devices; on the other hand, a web server logs information on all incoming requests, outgoing responses, time taken for a response, and so on. Similarly, application logs for an e-commerce website will log business-specific logs.

As the logs vary by their content, so will their uses. For example, the logs from a scanner may be used for troubleshooting or for a simple status check or reporting while the web server log is used to analyze traffic patterns across multiple products. Analysis of logs from an e-commerce site can help figure out whether packages from a specific location are returned repeatedly and the probable reasons for the same.

The following are some common use cases where log analysis is helpful:

  • Issue debugging
  • Performance analysis
  • Security analysis
  • Predictive analysis
  • Internet of things (IoT) and logging

Issue debugging

Debugging is one of the most common reasons to enable logging within your application. The simplest and most frequent use for a debug log is to grep for a specific error message or event occurrence. If a system administrator believes that a program crashed because of a network failure, then he or she will try to find a connection dropped message or a similar message in the server logs to analyze what caused the issue. Once the bug or the issue is identified, log analysis solutions help capture application information and snapshots of that particular time can be easily passed across development teams to analyze it further.

Performance analysis

Log analysis helps optimize or debug system performance and give essential inputs around bottlenecks in the system. Understanding a system's performance is often about understanding resource usage in the system. Logs can help analyze individual resource usage in the system, behavior of multiple threads in the application, potential deadlock conditions, and so on. Logs also carry with them timestamp information, which is essential to analyze how the system is behaving over time. For instance, a web server log can help know how individual services are performing based on response times, HTTP response codes, and so on.

Security analysis

Logs play a vital role in managing the application security for any organization. They are particularly helpful to detect security breaches, application misuse, malicious attacks, and so on. When users interact with the system, it generates log events, which can help track user behavior, identify suspicious activities, and raise alarms or security incidents for breaches.

The intrusion detection process involves session reconstruction from the logs itself. For example, ssh login events in the system can be used to identify any breaches on the machines.

Predictive analysis

Predictive analysis is one of the hot trends of recent times. Logs and events data can be used for very accurate predictive analysis. Predictive analysis models help in identifying potential customers, resource planning, inventory management and optimization, workload efficiency, and efficient resource scheduling. It also helps guide the marketing strategy, user-segment targeting, ad-placement strategy, and so on.

Internet of things and logging

When it comes to IoT devices (devices or machines that interact with each other without any human intervention), it is vital that the system is monitored and managed to keep downtime to a minimum and resolve any important bugs or issues swiftly. Since these devices should be able to work with little human intervention and may exist on a large geographical scale, log data is expected to play a crucial role in understanding system behavior and reducing downtime.

Challenges in log analysis

The current log analysis process mostly involves checking logs at multiple servers that are written by different components and systems across your application. This has various problems, which makes it a time-consuming and tedious job. Let's look at some of the common problem scenarios:

  • Non-consistent log format
  • Decentralized logs
  • Expert knowledge requirement

Non-consistent log format

Every application and device logs in its own special way, so each format needs its own expert. Also, it is difficult to search across because of different formats.

Let's take a look at some of the common log formats. An interesting thing to observe will be the way different logs represent different timestamp formats, different ways to represent INFO, ERROR, and so on, and the order of these components with logs. It's difficult to figure out just by seeing logs what is present at what location. This is where tools such as Logstash help.

Tomcat logs

A typical tomcat server startup log entry will look like this:

May 24, 2015 3:56:26 PM org.apache.catalina.startup.HostConfig deployWAR
INFO: Deployment of web application archive \soft\apache-tomcat-7.0.62\webapps\sample.war has finished in 253 ms

Apache access logs – combined log format

A typical Apache access log entry will look like this:

127.0.0.1 - - [24/May/2015:15:54:59 +0530] "GET /favicon.ico HTTP/1.1" 200 21630

IIS logs

A typical IIS log entry will look like this:

2012-05-02 17:42:15 172.24.255.255 - 172.20.255.255 80 GET /images/favicon.ico - 200 Mozilla/4.0+(compatible;MSIE+5.5;+Windows+2000+Server)

Variety of time formats

Not only log formats, but timestamp formats are also different among different types of applications, different types of events generated across multiple devices, and so on. Different types of time formats across different components of your system also make it difficult to correlate events occurring across multiple systems at the same time:

  • 142920788
  • Oct 12 23:21:45
  • [5/May/2015:08:09:10 +0000]
  • Tue 01-01-2009 6:00
  • 2015-05-30 T 05:45 UTC
  • Sat Jul 23 02:16:57 2014
  • 07:38, 11 December 2012 (UTC)

Decentralized logs

Logs are mostly spread across all the applications that may be across different servers and different components. The complexity of log analysis increases with multiple components logging at multiple locations. For one or two servers' setup, finding out some information from logs involves running cat or tail commands or piping these results to grep command. But what if you have 10, 20, or say, 100 servers? These kinds of searches are mostly not scalable for a huge cluster of machines and need a centralized log management and an analysis solution.

Expert knowledge requirement

People interested in getting the required business-centric information out of logs generally don't have access to the logs or may not have the technical expertise to figure out the appropriate information in the quickest possible way, which can make analysis slower, and sometimes, impossible too.

The ELK Stack

The ELK platform is a complete log analytics solution, built on a combination of three open source tools—Elasticsearch, Logstash, and Kibana. It tries to address all the problems and challenges that we saw in the previous section. ELK utilizes the open source stack of Elasticsearch for deep search and data analytics; Logstash for centralized logging management, which includes shipping and forwarding the logs from multiple servers, log enrichment, and parsing; and finally, Kibana for powerful and beautiful data visualizations. ELK stack is currently maintained and actively supported by the company called Elastic (formerly, Elasticsearch).

Let's look at a brief overview of each of these systems:

  • Elasticsearch
  • Logstash
  • Kibana

Elasticsearch

Elasticsearch is a distributed open source search engine based on Apache Lucene, and released under an Apache 2.0 license (which means that it can be downloaded, used, and modified free of charge). It provides horizontal scalability, reliability, and multitenant capability for real-time search. Elasticsearch features are available through JSON over a RESTful API. The searching capabilities are backed by a schema-less Apache Lucene Engine, which allows it to dynamically index data without knowing the structure beforehand. Elasticsearch is able to achieve fast search responses because it uses indexing to search over the texts.

Elasticsearch is used by many big companies, such as GitHub, SoundCloud, FourSquare, Netflix, and many others. Some of the use cases are as follows:

  • Wikipedia: This uses Elasticsearch to provide a full text search, and provide functionalities, such as search-as-you-type, and did-you-mean suggestions.
  • The Guardian: This uses Elasticsearch to process 40 million documents per day, provide real-time analytics of site-traffic across the organization, and help understand audience engagement better.
  • StumbleUpon: This uses Elasticsearch to power intelligent searches across its platform and provide great recommendations to millions of customers.
  • SoundCloud: This uses Elasticsearch to provide real-time search capabilities for millions of users across geographies.
  • GitHub: This uses Elasticsearch to index over 8 million code repositories, and index multiple events across the platform, hence providing real-time search capabilities across it.

Some of the key features of Elasticsearch are:

  • It is an open source distributed, scalable, and highly available real-time document store
  • It provides real-time search and analysis capabilities
  • It provides a sophisticated RESTful API to play around with lookup, and various features, such as multilingual search, geolocation, autocomplete, contextual did-you-mean suggestions, and result snippets
  • It can be scaled horizontally easily and provides easy integrations with cloud-based infrastructures, such as AWS and others

Logstash

Logstash is a data pipeline that helps collect, parse, and analyze a large variety of structured and unstructured data and events generated across various systems. It provides plugins to connect to various types of input sources and platforms, and is designed to efficiently process logs, events, and unstructured data sources for distribution into a variety of outputs with the use of its output plugins, namely file, stdout (as output on console running Logstash), or Elasticsearch.

It has the following key features:

  • Centralized data processing: Logstash helps build a data pipeline that can centralize data processing. With the use of a variety of plugins for input and output, it can convert a lot of different input sources to a single common format.
  • Support for custom log formats: Logs written by different applications often have particular formats specific to the application. Logstash helps parse and process custom formats on a large scale. It provides support to write your own filters for tokenization and also provides ready-to-use filters.
  • Plugin development: Custom plugins can be developed and published, and there is a large variety of custom developed plugins already available.

Kibana

Kibana is an open source Apache 2.0 licensed data visualization platform that helps in visualizing any kind of structured and unstructured data stored in Elasticsearch indexes. Kibana is entirely written in HTML and JavaScript. It uses the powerful search and indexing capabilities of Elasticsearch exposed through its RESTful API to display powerful graphics for the end users. From basic business intelligence to real-time debugging, Kibana plays its role through exposing data through beautiful histograms, geomaps, pie charts, graphs, tables, and so on.

Kibana makes it easy to understand large volumes of data. Its simple browser-based interface enables you to quickly create and share dynamic dashboards that display changes to Elasticsearch queries in real time.

Some of the key features of Kibana are as follows:

  • It provides flexible analytics and a visualization platform for business intelligence.
  • It provides real-time analysis, summarization, charting, and debugging capabilities.
  • It provides an intuitive and user friendly interface, which is highly customizable through some drag and drop features and alignments as and when needed.
  • It allows saving the dashboard, and managing more than one dashboard. Dashboards can be easily shared and embedded within different systems.
  • It allows sharing snapshots of logs that you have already searched through, and isolates multiple problem transactions.

ELK data pipeline

A typical ELK stack data pipeline looks something like this:

ELK data pipeline

In a typical ELK Stack data pipeline, logs from multiple application servers are shipped through Logstash shipper to a centralized Logstash indexer. The Logstash indexer will output data to an Elasticsearch cluster, which will be queried by Kibana to display great visualizations and build dashboards over the log data.

Left arrow icon Right arrow icon

Description

The ELK stack—Elasticsearch, Logstash, and Kibana, is a powerful combination of open source tools. Elasticsearch is for deep search and data analytics. Logstash is for centralized logging, log enrichment, and parsing. Kibana is for powerful and beautiful data visualizations. In short, the Elasticsearch ELK stack makes searching and analyzing data easier than ever before. This book will introduce you to the ELK (Elasticsearch, Logstash, and Kibana) stack, starting by showing you how to set up the stack by installing the tools, and basic configuration. You’ll move on to building a basic data pipeline using the ELK stack. Next, you’ll explore the key features of Logstash and its role in the ELK stack, including creating Logstash plugins, which will enable you to use your own customized plugins. The importance of Elasticsearch and Kibana in the ELK stack is also covered, along with various types of advanced data analysis, and a variety of charts, tables ,and maps. Finally, by the end of the book you will be able to develop full-fledged data pipeline using the ELK stack and have a solid understanding of the role of each of the components.

Who is this book for?

If you are a developer or DevOps engineer interested in building a system that provides amazing insights and business metrics out of data sources, of various formats and types, using the open source technology stack that ELK provides, then this book is for you. Basic knowledge of Unix or any programming language will be helpful to make the most out of this book.

What you will learn

  • Install, configure, and run Elasticsearch, Logstash, and Kibana
  • Understand the need for log analytics and the current challenges in log analysis
  • Build your own data pipeline using the ELK stack
  • Familiarize yourself with the key features of Logstash and the variety of input, filter, and output plugins it provides
  • Build your own custom Logstash plugin
  • Create actionable insights using charts, histograms, and quick search features in Kibana4
  • Understand the role of Elasticsearch in the ELK stack

Product Details

Country selected
Publication date, Length, Edition, Language, ISBN-13
Publication date : Nov 26, 2015
Length: 206 pages
Edition : 1st
Language : English
ISBN-13 : 9781785887154
Vendor :
Elastic
Category :

What do you get with a Packt Subscription?

Free for first 7 days. $19.99 p/m after that. Cancel any time!
Product feature icon Unlimited ad-free access to the largest independent learning library in tech. Access this title and thousands more!
Product feature icon 50+ new titles added per month, including many first-to-market concepts and exclusive early access to books as they are being written.
Product feature icon Innovative learning tools, including AI book assistants, code context explainers, and text-to-speech.
Product feature icon Thousands of reference materials covering every tech concept you need to stay up to date.
Subscribe now
View plans & pricing

Product Details

Publication date : Nov 26, 2015
Length: 206 pages
Edition : 1st
Language : English
ISBN-13 : 9781785887154
Vendor :
Elastic
Category :

Packt Subscriptions

See our plans and pricing
Modal Close icon
$19.99 billed monthly
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Simple pricing, no contract
$199.99 billed annually
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Choose a DRM-free eBook or Video every month to keep
Feature tick icon PLUS own as many other DRM-free eBooks or Videos as you like for just $5 each
Feature tick icon Exclusive print discounts
$279.99 billed in 18 months
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Choose a DRM-free eBook or Video every month to keep
Feature tick icon PLUS own as many other DRM-free eBooks or Videos as you like for just $5 each
Feature tick icon Exclusive print discounts

Frequently bought together


Stars icon
Total $ 87.98
Learning ELK Stack
$48.99
Kibana Essentials
$38.99
Total $ 87.98 Stars icon
Banner background image

Table of Contents

11 Chapters
1. Introduction to ELK Stack Chevron down icon Chevron up icon
2. Building Your First Data Pipeline with ELK Chevron down icon Chevron up icon
3. Collect, Parse and Transform Data with Logstash Chevron down icon Chevron up icon
4. Creating Custom Logstash Plugins Chevron down icon Chevron up icon
5. Why Do We Need Elasticsearch in ELK? Chevron down icon Chevron up icon
6. Finding Insights with Kibana Chevron down icon Chevron up icon
7. Kibana – Visualization and Dashboard Chevron down icon Chevron up icon
8. Putting It All Together Chevron down icon Chevron up icon
9. ELK Stack in Production Chevron down icon Chevron up icon
10. Expanding Horizons with ELK Chevron down icon Chevron up icon
Index Chevron down icon Chevron up icon

Customer reviews

Rating distribution
Full star icon Full star icon Full star icon Half star icon Empty star icon 3.2
(5 Ratings)
5 star 20%
4 star 20%
3 star 40%
2 star 0%
1 star 20%
Akshay Aug 24, 2017
Full star icon Full star icon Full star icon Full star icon Full star icon 5
Great to begin ELK
Amazon Verified review Amazon
Amazon Customer Jan 25, 2016
Full star icon Full star icon Full star icon Full star icon Empty star icon 4
Just over a month ago, I purchased a copy of Learning ELK stack and is a good book for those wanting a quick ramp up of what ELK is, This is a good step-by-step guide for Elastic Search, Logstash and Kibana. this book its well written and gives an honest account, i would highly recommend this as a first book read to anyone wanting to know about ELK.
Amazon Verified review Amazon
Kindle Customer Jun 07, 2017
Full star icon Full star icon Full star icon Empty star icon Empty star icon 3
Great quickstart , got me up and going very quickly. It does however lack depth so if you need a reference this would have to be in addition to the elk documentation.
Amazon Verified review Amazon
Mark Grover Jan 09, 2016
Full star icon Full star icon Full star icon Empty star icon Empty star icon 3
A very good book, let down by inaccurate examples of unix command syntax which quite simply doesnt work. I am new to ELK and followed the example scenario presented in the book but had to google and research some of the command syntax as the printed examples returned errors. Its such a shame as the author presents a very thorough step by step guide to configuring the ELK components. I did manage to get everything working and would still recommend the book to newcomers despite the Unix command issues.
Amazon Verified review Amazon
Jascha Casadio Jul 02, 2016
Full star icon Empty star icon Empty star icon Empty star icon Empty star icon 1
The Elasticsearch, Logstash and Kibana trinity, usually referred to as the ELK stack, is by far, the de facto standard in log centralization and analysis. Despite being such a popular solution, with some half a million downloads per month, the titles available to the stack or specific to its components are still very limited. This is partially compensated by the official documentation, which is both friendly and easy to follow and allows anyone to quickly get started. Learning ELK Stack is the only title available, until now, that covers the three products at once. It targets beginners who are interested in an overall view of the stack and its components.Released at the end of 2015, Learning ELK Stack is a short book spanning around 200 pages. As any typical beginner's title, it does start with an introductory chapter that gets the reader through the installation process. What follows is a series of chapters where the author first shows the power of the stack and then dives deeper into its components.The very first chapter already reveals problems that are present throughout the whole book. It is very superficial and does not really get a beginner started. First of all, the stack does require Oracle's JDK installed and configured. This is completely overlooked. Now while someone might argue that it is pretty straightforward through a package manager, the reader can still be using a distro whose package manager installs a version of the JDK that is not suitable for the ELK stack. Likewise, the whole stack can be installed through simple apt commands. The author does cover installation through .tar.gz archives, completley overlooking installing and configuring Java from the source and the default Java (yea, you can have multiple at the same time). Which is not that straightforward.Installation apart, nothing is said about the configuration of the three softwares neither as a standalone nor as a stack. Well, this is not properly correct. The astonishing amount of twelve lines is dedicated to this in fact. After an overall overview of the stack, with an example built using data from Yahoo—you ain't limited to use the stack to process logs, the authors focuses on the components, each at a time. These chapters feel like a reference. Each option is listed, but the examples do not go beyond the two lines. Interestingly, by googling sentences from the book, we find a 70% match analysis with the official documentation (Harvard refers to this as mosaic plagiarism), suggesting a copy/paste with a couple of words added/removed.As an example, this is what we find in Learning ELK stack:The mutate filter is an important filter plugin that helps rename, remove, replace, and modify fields in an incoming event.And this is the information we freely find in the official documentation provided by Elastic:The mutate filter allows you to perform general mutations on fields. You can rename, remove, replace, and modify fields in your events.Tying it all up, I do not really recommend this book ,despite being the only title covering the stack as a whole. The documentation the reader will find is first of all inaccurate. Anything else can be found for free in the official documentation.I have tried to contact Packt Publishing asking what is their position on that matter. I did not get any real answer apart from a semi-automatic one.As usual, you can find more reviews on my personal blog: books.lostinmalloc.com. Feel free to pass by and share your thoughts!
Amazon Verified review Amazon
Get free access to Packt library with over 7500+ books and video courses for 7 days!
Start Free Trial

FAQs

What is included in a Packt subscription? Chevron down icon Chevron up icon

A subscription provides you with full access to view all Packt and licnesed content online, this includes exclusive access to Early Access titles. Depending on the tier chosen you can also earn credits and discounts to use for owning content

How can I cancel my subscription? Chevron down icon Chevron up icon

To cancel your subscription with us simply go to the account page - found in the top right of the page or at https://subscription.packtpub.com/my-account/subscription - From here you will see the ‘cancel subscription’ button in the grey box with your subscription information in.

What are credits? Chevron down icon Chevron up icon

Credits can be earned from reading 40 section of any title within the payment cycle - a month starting from the day of subscription payment. You also earn a Credit every month if you subscribe to our annual or 18 month plans. Credits can be used to buy books DRM free, the same way that you would pay for a book. Your credits can be found in the subscription homepage - subscription.packtpub.com - clicking on ‘the my’ library dropdown and selecting ‘credits’.

What happens if an Early Access Course is cancelled? Chevron down icon Chevron up icon

Projects are rarely cancelled, but sometimes it's unavoidable. If an Early Access course is cancelled or excessively delayed, you can exchange your purchase for another course. For further details, please contact us here.

Where can I send feedback about an Early Access title? Chevron down icon Chevron up icon

If you have any feedback about the product you're reading, or Early Access in general, then please fill out a contact form here and we'll make sure the feedback gets to the right team. 

Can I download the code files for Early Access titles? Chevron down icon Chevron up icon

We try to ensure that all books in Early Access have code available to use, download, and fork on GitHub. This helps us be more agile in the development of the book, and helps keep the often changing code base of new versions and new technologies as up to date as possible. Unfortunately, however, there will be rare cases when it is not possible for us to have downloadable code samples available until publication.

When we publish the book, the code files will also be available to download from the Packt website.

How accurate is the publication date? Chevron down icon Chevron up icon

The publication date is as accurate as we can be at any point in the project. Unfortunately, delays can happen. Often those delays are out of our control, such as changes to the technology code base or delays in the tech release. We do our best to give you an accurate estimate of the publication date at any given time, and as more chapters are delivered, the more accurate the delivery date will become.

How will I know when new chapters are ready? Chevron down icon Chevron up icon

We'll let you know every time there has been an update to a course that you've bought in Early Access. You'll get an email to let you know there has been a new chapter, or a change to a previous chapter. The new chapters are automatically added to your account, so you can also check back there any time you're ready and download or read them online.

I am a Packt subscriber, do I get Early Access? Chevron down icon Chevron up icon

Yes, all Early Access content is fully available through your subscription. You will need to have a paid for or active trial subscription in order to access all titles.

How is Early Access delivered? Chevron down icon Chevron up icon

Early Access is currently only available as a PDF or through our online reader. As we make changes or add new chapters, the files in your Packt account will be updated so you can download them again or view them online immediately.

How do I buy Early Access content? Chevron down icon Chevron up icon

Early Access is a way of us getting our content to you quicker, but the method of buying the Early Access course is still the same. Just find the course you want to buy, go through the check-out steps, and you’ll get a confirmation email from us with information and a link to the relevant Early Access courses.

What is Early Access? Chevron down icon Chevron up icon

Keeping up to date with the latest technology is difficult; new versions, new frameworks, new techniques. This feature gives you a head-start to our content, as it's being created. With Early Access you'll receive each chapter as it's written, and get regular updates throughout the product's development, as well as the final course as soon as it's ready.We created Early Access as a means of giving you the information you need, as soon as it's available. As we go through the process of developing a course, 99% of it can be ready but we can't publish until that last 1% falls in to place. Early Access helps to unlock the potential of our content early, to help you start your learning when you need it most. You not only get access to every chapter as it's delivered, edited, and updated, but you'll also get the finalized, DRM-free product to download in any format you want when it's published. As a member of Packt, you'll also be eligible for our exclusive offers, including a free course every day, and discounts on new and popular titles.