Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases now! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Conferences
Free Learning
Arrow right icon
Machine Learning with the Elastic Stack
Machine Learning with the Elastic Stack

Machine Learning with the Elastic Stack: Gain valuable insights from your data with Elastic Stack's machine learning features , Second Edition

Arrow left icon
Profile Icon Rich Collier Profile Icon Camilla Montonen Profile Icon Bahaaldine Azarmi
Arrow right icon
$24.99 $35.99
Full star icon Full star icon Full star icon Full star icon Full star icon 5 (9 Ratings)
eBook May 2021 450 pages 2nd Edition
eBook
$24.99 $35.99
Paperback
$48.99
Subscription
Free Trial
Renews at $19.99p/m
Arrow left icon
Profile Icon Rich Collier Profile Icon Camilla Montonen Profile Icon Bahaaldine Azarmi
Arrow right icon
$24.99 $35.99
Full star icon Full star icon Full star icon Full star icon Full star icon 5 (9 Ratings)
eBook May 2021 450 pages 2nd Edition
eBook
$24.99 $35.99
Paperback
$48.99
Subscription
Free Trial
Renews at $19.99p/m
eBook
$24.99 $35.99
Paperback
$48.99
Subscription
Free Trial
Renews at $19.99p/m

What do you get with eBook?

Product feature icon Instant access to your Digital eBook purchase
Product feature icon Download this book in EPUB and PDF formats
Product feature icon Access this title in our online reader with advanced features
Product feature icon DRM FREE - Read whenever, wherever and however you want
Product feature icon AI Assistant (beta) to help accelerate your learning
Table of content icon View table of contents Preview book icon Preview Book

Machine Learning with the Elastic Stack

Chapter 1: Machine Learning for IT

A decade ago, the idea of using machine learning (ML)-based technology in IT operations or IT security seemed a little like science fiction. Today, however, it is one of the most common buzzwords used by software vendors. Clearly, there has been a major shift in both the perception of the need for the technology and the capabilities that the state-of-the-art implementations of the technology can bring to bear. This evolution is important to fully appreciate how Elastic ML came to be and what problems it was designed to solve.

This chapter is dedicated to reviewing the history and concepts behind how Elastic ML works. It also discusses the different kinds of analysis that can be done and the kinds of use cases that can be solved. Specifically, we will cover the following topics:

  • Overcoming the historical challenges in IT
  • Dealing with the plethora of data
  • The advent of automated anomaly detection
  • Unsupervised versus supervised ML
  • Using unsupervised ML for anomaly detection
  • Applying supervised ML to data frame analytics

Overcoming the historical challenges in IT

IT application support specialists and application architects have a demanding job with high expectations. Not only are they tasked with moving new and innovative projects into place for the business, but they also have to keep currently deployed applications up and running as smoothly as possible. Today's applications are significantly more complicated than ever before—they are highly componentized, distributed, and possibly virtualized/containerized. They could be developed using Agile, or by an outsourced team. Plus, they are most likely constantly changing. Some DevOps teams claim they can typically make more than 100 changes per day to a live production system. Trying to understand a modern application's health and behavior is like a mechanic trying to inspect an automobile while it is moving.

IT security operations analysts have similar struggles in keeping up with day-to-day operations, but they obviously have a different focus of keeping the enterprise secure and mitigating emerging threats. Hackers, malware, and rogue insiders have become so ubiquitous and sophisticated that the prevailing wisdom is that it is no longer a question of whether an organization will be compromised—it's more of a question of when they will find out about it. Clearly, knowing about a compromise as early as possible (before too much damage is done) is preferable to learning about it for the first time from law enforcement or the evening news.

So, how can they be helped? Is the crux of the problem that application experts and security analysts lack access to data to help them do their job effectively? Actually, in most cases, it is the exact opposite. Many IT organizations are drowning in data.

Dealing with the plethora of data

IT departments have invested in monitoring tools for decades, and it is not uncommon to have a dozen or more tools actively collecting and archiving data that can be measured in terabytes, or even petabytes, per day. The data can range from rudimentary infrastructure- and network-level data to deep diagnostic data and/or system and application log files.

Business-level key performance indicators (KPIs) could also be tracked, sometimes including data about the end user's experience. The sheer depth and breadth of data available, in some ways, is the most comprehensive than it has ever been. To detect emerging problems or threats hidden in that data, there have traditionally been several main approaches to distilling the data into informational insights:

  • Filter/search: Some tools allow the user to define searches to help trim down the data into a more manageable set. While extremely useful, this capability is most often used in an ad hoc fashion once a problem is suspected. Even then, the success of using this approach usually hinges on the ability for the user to know what they are looking for and their level of experience—both with prior knowledge of living through similar past situations and expertise in the search technology itself.
  • Visualizations: Dashboards, charts, and widgets are also extremely useful to help us understand what data has been doing and where it is trending. However, visualizations are passive and require being watched for meaningful deviations to be detected. Once the number of metrics being collected and plotted surpasses the number of eyeballs available to watch them (or even the screen real estate to display them), visual-only analysis becomes less and less useful.
  • Thresholds/rules: To get around the requirement of having data be physically watched in order for it to be proactive, many tools allow the user to define rules or conditions that get triggered upon known conditions or known dependencies between items. However, it is unlikely that you can realistically define all appropriate operating ranges or model all of the actual dependencies in today's complex and distributed applications. Plus, the amount and velocity of changes in the application or environment could quickly render any static rule set useless. Analysts find themselves chasing down many false positive alerts, setting up a boy who cried wolf paradigm that leads to resentment of the tools generating the alerts and skepticism of the value that alerting could provide.

Ultimately, there needed to be a different approach—one that wasn't necessarily a complete repudiation of past techniques, but could bring a level of automation and empirical augmentation of the evaluation of data in a meaningful way. Let's face it, humans are imperfect—we have hidden biases and limitations of capacity for remembering information and we are easily distracted and fatigued. Algorithms, if used correctly, can easily make up for these shortcomings.

The advent of automated anomaly detection

ML, while a very broad topic that encompasses everything from self-driving cars to game-winning computer programs, was a natural place to look for a solution. If you realize that most of the requirements of effective application monitoring or security threat hunting are merely variations on the theme of find me something that is different from normal, then the discipline of anomaly detection emerges as the natural place to begin using ML techniques to solve these problems for IT professionals.

The science of anomaly detection is certainly nothing new, however. Many very smart people have researched and employed a variety of algorithms and techniques for many years. However, the practical application of anomaly detection for IT data poses some interesting constraints that make the otherwise academically worthy algorithms inappropriate for the job. These include the following:

  • Timeliness: Notification of an outage, breach, or other significant anomalous situation should be known as quickly as possible to mitigate it. The cost of downtime or the risk of a continued security compromise is minimized if remedied or contained quickly. Algorithms that cannot keep up with the real-time nature of today's IT data have limited value.
  • Scalability: As mentioned earlier, the volume, velocity, and variation of IT data continue to explode in modern IT environments. Algorithms that inspect this vast data must be able to scale linearly with the data to be usable in a practical sense.
  • Efficiency: IT budgets are often highly scrutinized for wasteful spending, and many organizations are constantly being asked to do more with less. Tacking on an additional fleet of super-computers to run algorithms is not practical. Rather, modest commodity hardware with typical specifications must be able to be employed as part of the solution.
  • Generalizability: While highly specialized data science is often the best way to solve a specific information problem, the diversity of data in IT environments drives a need for something that can be broadly applicable across most use cases. Reusability of the same techniques is much more cost-effective in the long run.
  • Adaptability: Ever-changing IT environments will quickly render a brittle algorithm useless in no time. Training and retraining the ML model would only introduce yet another time-wasting venture that cannot be afforded.
  • Accuracy: We already know that alert fatigue from legacy threshold and rule-based systems is a real problem. Swapping one false alarm generator for another will not impress anyone.
  • Ease of use: Even if all of the previously mentioned constraints could be satisfied, any solution that requires an army of data scientists to implement it would be too costly and would be disqualified immediately.

So, now we are getting to the real meat of the challenge—creating a fast, scalable, accurate, low-cost anomaly detection solution that everyone will use and love because it works flawlessly. No problem!

As daunting as that sounds, Prelert founder and CTO Steve Dodson took on that challenge back in 2010. While Dodson certainly brought his academic chops to the table, the technology that would eventually become Elastic ML had its genesis in the throes of trying to solve real IT application problems—the first being a pesky intermittent outage in a trading platform at a major London finance company. Dodson, and a handful of engineers who joined the venture, helped the bank's team use the anomaly detection technology to automatically surface only the needles in the haystacks that allowed the analysts to focus on the small set of relevant metrics and log messages that were going awry. The identification of the root cause (a failing service whose recovery caused a cascade of subsequent network problems that wreaked havoc) ultimately brought stability to the application and prevented the need for the bank to spend lots of money on the previous solution, which was an unplanned, costly network upgrade.

As time passed, however, it became clear that even that initial success was only the beginning. A few years and a few thousand real-world use cases later, the marriage of Prelert and Elastic was a natural one—a combination of a platform making big data easily accessible and technology that helped overcome the limitations of human analysis.

Fast forward to 2021, a full 5 years after the joining of forces, and Elastic ML has come a long way in the maturation and expansion of capabilities of the ML platform. This second edition of the book encapsulates the updates made to Elastic ML over the years, including the introduction of integrations into several of the Elastic solutions around observability and security. This second edition also includes the introduction of "data frame analytics," which is discussed extensively in the third part of the book. In order to get a grounded, innate understanding of how Elastic ML works, we first need to get to grips with some terminology and concepts to understand things further.

Unsupervised versus supervised ML

While there are many subtypes of ML, two very prominent ones (and the two that are relevant to Elastic ML) are unsupervised and supervised.

In unsupervised ML, there is no outside guidance or direction from humans. In other words, the algorithms must learn (and model) the patterns of the data purely on their own. In general, the biggest challenge here is to have the algorithms accurately surface detected deviations of the input data's normal patterns to provide meaningful insight for the user. If the algorithm is not able to do this, then it is not useful and is unsuitable for use. Therefore, the algorithms must be quite robust and able to account for all of the intricacies of the way that the input data is likely to behave.

In supervised ML, input data (often multivariate data) is used to help model the desired outcome. The key difference from unsupervised ML is that the human decides, a priori, what variables to use as the input and also provides "ground-truth" examples of the expected target variable. Algorithms then assess how the input variables interact and influence the known output target. To accurately get the desired output (prediction, for example), the algorithm must have "the right kind of data" not only that indeed expresses the situation, but also so that there is enough diversity of the input data in order to effectively learn the relationship between the input data and the output target.

As such, both cases require good input data, good algorithmic approaches, and a good mechanism to allow the ML to both learn the behavior of the data and apply that learning to assess subsequent observations of that data. Let's dig a little deeper into the specifics of how Elastic ML leverages unsupervised and supervised learning.

Using unsupervised ML for anomaly detection

To get a more intuitive understanding of how Elastic ML's anomaly detection works using unsupervised ML, we will discuss the following:

  • A rigorous definition of unusual with respect to the technology
  • An intuitive example of learning in an unsupervised manner
  • A description of how the technology models, de-trends, and scores the data

Defining unusual

Anomaly detection is something almost all of us have a basic intuition about. Humans are quite good at pattern recognition, so it should be of no surprise that if I asked 100 people on the street what's unusual in the following graph, a vast majority (including non-technical people) would identify the spike in the green line:

Figure 1.1 – A line graph showing an anomaly

Figure 1.1 – A line graph showing an anomaly

Similarly, let's say we ask what's unusual in the following photo:

Figure 1.2 – A photograph showing a seal among penguins

Figure 1.2 – A photograph showing a seal among penguins

We will, again, likely get a majority that rightly claims that the seal is the unusual thing. But people may struggle to articulate in salient terms the actual heuristics that are used in coming to those conclusions.

There are two different heuristics that we could use to define the different kinds of anomalies shown in these images:

  • Something is unusual if its behavior has significantly deviated from an established pattern or range based upon its past history.
  • Something is unusual if some characteristic of that entity is significantly different from the same characteristic of the other members of a set or population.

These key definitions will be relevant to Elastic ML's anomaly detection, as they form the two main fundamental modes of operation of the anomaly detection algorithms (temporal versus population analysis, as will be explored in Chapter 3, Anomaly Detection). As we will see, the user will have control over what mode of operation is employed for a particular use case.

Learning what's normal

As we've stated, Elastic ML's anomaly detection uses unsupervised learning in that the learning occurs without anything being taught. There is no human assistance to shape the decisions of the learning; it simply does so on its own, via inspection of the data it is presented with. This is slightly analogous to the learning of a language via the process of immersion, as opposed to sitting down with books of vocabulary and rules of grammar.

To go from a completely naive state where nothing is known about a situation to one where predictions could be made with good certainty, a model of the situation needs to be constructed. How this model is created is extremely important, as the efficacy of all subsequent actions taken based upon this model will be highly dependent on the model's accuracy. The model will need to be flexible and continuously updated based upon new information, because that is all that it has to go on in this unsupervised paradigm.

Probability models

Probability distributions can serve this purpose quite well. There are many fundamental types of distributions (and Elastic ML uses a variety of distribution types, such as Poisson, Gaussian, log-normal, or even mixtures of models), but the Poisson distribution is a good one to discuss first, because it is appropriate in situations where there are discrete occurrences (the "counts") of things with respect to time:

Figure 1.3 – A graph demonstrating Poisson distributions (source: https://en.wikipedia.org/wiki/Poisson_distribution#/media/File:Poisson_pmf.svg)

Figure 1.3 – A graph demonstrating Poisson distributions (source: https://en.wikipedia.org/wiki/Poisson_distribution#/media/File:Poisson_pmf.svg)

There are three different variants of the distribution shown here, each with a different mean (λ) and the highest expected value of k. We can make an analogy that says that these distributions model the expected amount of postal mail that a person gets delivered to their home on a daily basis, represented by k on the x axis:

  • For λ = 1, there is about a 37% chance that zero pieces or one piece of mail are delivered daily. Perhaps this is appropriate for a college student that doesn't receive much postal mail.
  • For λ = 4, there is about a 20% chance that three or four pieces are received. This might be a good model for a young professional.
  • For λ = 10, there is about a 13% chance that 10 pieces are received per day—perhaps representing a larger family or a household that has somehow found themselves on many mailing lists!

The discrete points on each curve also give the likelihood (probability) of other values of k. As such, the model can be informative and answer questions such as "Is getting 15 pieces of mail likely?" As we can see, it is not likely for a student (λ = 1) or a young professional (λ = 4), but it is somewhat likely for a large family (λ = 10). Obviously, there was a simple declaration made here that the models shown were appropriate for the certain people described—but it should seem obvious that there needs to be a mechanism to learn that model for each individual situation, not just assert it. The process for learning it is intuitive.

Learning the models

Sticking with the postal mail analogy, it would be instinctive to realize that a method of determining what model is the best fit for a particular household could be ascertained simply by hanging out by the mailbox every day and recording what the postal carrier drops into the mailbox. It should also seem obvious that the more observations made, the higher your confidence should be that your model is accurate. In other words, only spending 3 days by the mailbox would provide less complete information and confidence than spending 30 days, or 300 for that matter.

Algorithmically, a similar process could be designed to self-select the appropriate model based upon observations. Careful scrutiny of the algorithm's choices of the model type itself (that is, Poisson, Gaussian, log-normal, and so on) and the specific coefficients of that model type (as in the preceding example of λ) would also need to be part of this self-selection process. To do this, constant evaluation of the appropriateness of the model is done. Bayesian techniques are also employed to assess the model's likely parameter values, given the dataset as a whole, but allowing for tempering of those decisions based upon how much information has been seen prior to a particular point in time. The ML algorithms accomplish this automatically.

Note

For those that want a deeper dive into some of the representative mathematics going on behind the scenes, please refer to the academic paper at http://www.ijmlc.org/papers/398-LC018.pdf.

Most importantly, the modeling that is done is continuous, so that new information is considered along with the old, with an exponential weighting given to information that is fresher. Such a model, after 60 observations, could resemble the following:

Figure 1.4 – Sample model after 60 observations

Figure 1.4 – Sample model after 60 observations

It will then seem very different after 400 observations, as the data presents itself with a slew of new observations with values between 5 and 10:

Figure 1.5 – Sample model after 400 observations

Figure 1.5 – Sample model after 400 observations

Also, notice that there is the potential for the model to have multiple modes or areas/clusters of higher probability. The complexity and trueness of the fit of the learned model (shown as the blue curve) with the theoretically ideal model (in black) matters greatly. The more accurate the model, the better representation of the state of normal for that dataset and thus, ultimately, the more accurate the prediction of how future values comport with this model.

The continuous nature of the modeling also drives the requirement that this model is capable of serialization to long-term storage, so that if model creation/analysis is paused, it can be reinstated and resumed at a later time. As we will see, the operationalization of this process of model creation, storage, and utilization is a complex orchestration, which is fortunately handled automatically by Elastic ML.

De-trending

Another important aspect of faithfully modeling real-world data is to account for prominent overtone trends and patterns that naturally occur. Does the data ebb and flow hourly and/or daily with more activity during business hours or business days? If so, this needs to be accounted for. Elastic ML automatically hunts for prominent trends in the data (linear growth, cyclical harmonics, and so on) and factors them out. Let's observe the following graph:

Figure 1.6 – Periodicity detection in action

Figure 1.6 – Periodicity detection in action

Here, the periodic daily cycle is learned, then factored out. The model's prediction boundaries (represented in the light-blue envelope around the dark-blue signal) dramatically adjust after automatically detecting three successive iterations of that cycle.

Therefore, as more data is observed over time, the models gain accuracy both from the perspective of the probability distribution function getting more mature, as well as via the auto-recognizing and de-trending of other routine patterns (such as business days, weekends, and so on) that might not emerge for days or weeks. In the following example, several trends are discovered over time, including daily, weekly, and an overall linear slope:

Figure 1.7 – Multiple trends being detected

Figure 1.7 – Multiple trends being detected

These model changes are recorded as system annotations. Annotations, as a general concept, will be covered in later chapters.

Scoring of unusualness

Once a model has been constructed, the likelihood of any future observed value can be found within the probability distribution. Earlier, we had asked the question "Is getting 15 pieces of mail likely?" This question can now be empirically answered, depending on the model, with a number between 0 (no possibility) and 1 (absolute certainty). Elastic ML will use the model to calculate this fractional value out to approximately 300 significant figures (which can be helpful when dealing with very low probabilities). Let's observe the following graph:

Figure 1.8 – Anomaly scoring

Figure 1.8 – Anomaly scoring

Here, the probability of the observation of the actual value of 921 is now calculated to be 1.444e-9 (or, more commonly, a mere 0.0000001444% chance). This very small value is perhaps not that intuitive to most people. As such, ML will take this probability calculation, and via the process of quantile normalization, re-cast that observation on a severity scale between 0 and 100, where 100 is the highest level of unusualness possible for that particular dataset. In the preceding case, the probability calculation of 1.444e-9 is normalized to a score of 94. This normalized score will come in handy later as a means by which to assess the severity of the anomaly for the purposes of alerting and/or triage.

The element of time

In Elastic ML, all of the anomaly detection that we will discuss throughout the rest of the book will have an intrinsic element of time associated with the data and analysis. In other words, for anomaly detection, Elastic ML expects the data to be time series data and that data will be analyzed in increments of time. This is a key point and also helps discriminate between anomaly detection and data frame analytics in addition to the unsupervised/supervised paradigm.

You will see that there's a slight nuance with respect to population analysis (covered in Chapter 3, Anomaly Detection) and outlier detection (covered in Chapter 10, Outlier Detection While they effectively both find entities that are distinctly different from their peers, population analysis in anomaly detection does so with respect to time, whereas outlier detection analysis isn't constrained by time. More will become obvious as these topics are covered in depth in later chapters.

Applying supervised ML to data frame analytics

With the exception of outlier detection (covered in Chapter 10, Outlier Detection which actually is an unsupervised approach, the rest of data frame analytics uses a supervised approach. Specifically, there are two main types of problems that Elastic ML data frame analytics allows you to address:

  • Regression: Used to predict a continuous numerical value (a price, a duration, a temperature, and so on)
  • Classification: Used to predict whether something is of a certain class label (fraudulent transaction versus non-fraudulent, and more)

In both cases, models are built using training data to map input variables (which can be numerical or categorical) to output predictions by training decision trees. The particular implementation used by Elastic ML is a custom variant of XGBoost, an open source gradient-boosted decision tree framework that has recently gained some notoriety among data scientists for its ability to allow them to win Kaggle competitions.

The process of supervised learning

The overall process of supervised ML is very different from the unsupervised approach. In the supervised approach, you distinctly separate the training stage from the predicting stage. A very simplified version of the process looks like the following:

Figure 1.9 – Supervised ML process

Figure 1.9 – Supervised ML process

Here, we can see that in the training phase, features are extracted out of the raw training data to create a feature matrix (also called a data frame) to feed to the ML algorithm and create the model. The model can be validated against portions of the data to see how well it did, and subsequent refinement steps could be made to adjust which features are extracted, or to refine the parameters of the ML algorithm used to improve the accuracy of the model's predictions.

Once the user decides that the model is efficacious, that model is "moved" to the prediction workflow, where it is used on new data. One at a time, a single new feature vector is inferenced against the model to form a prediction.

To get an intuitive sense of how this works, imagine a scenario in which you want to sell your house, but don't know what price to list it for. You research prior sales in your area and notice the price differentials for homes based on different factors (number of bedrooms, number of bathrooms, square footage, proximity to schools/shopping, age of home, and so on). Those factors are the "features" that are considered altogether (not individually) for every prior sale.

This corpus of historical sales is your training data. It is helpful because you know for certain how much each property sold for (and that's the thing you'd ultimately like to predict for your house). If you study this enough, you might get an intuition about how the prices of houses are driven strongly by some features (for instance, the number of bedrooms) and that other features (perhaps the age of the home) may not affect the pricing much. This is a concept called "feature importance" that will be visited again in a later chapter.

Armed with enough training data, you might have a good idea what the value of your home should be priced at, given that it is a three-bedroom, two-bath, 1,700-square-foot, 30-year-old home. In other words, you've constructed a model in your mind based on your research of comparable homes that have sold in the last year or so. If the past sales are the "training data," your home's specifications (bedrooms, bathrooms, and so on) are the feature vectors that will define the expected price, given your "model" that you've learned.

Your simple mental model is obviously not as rigorous as one that could be constructed with regression analysis using ML using dozens of relevant input features, but this simple analogy hopefully cements the idea of the process that is followed in learning from prior, known situations, and then applying that knowledge to a present, novel situation.

Summary

To summarize what we discussed in this chapter, we covered the genesis story of ML in IT—born out of the necessity to automate analysis of the massive, ever-expanding growth of collected data within enterprise environments. We also got a more intuitive understanding of the different types of ML in Elastic ML, which includes both unsupervised anomaly detection and supervised data frame analysis.

As we journey through the rest of the chapters, we will often be mapping the use cases of the problems we're trying to solve to the different modes of operation of Elastic ML.

Remember that if the data is a time series, meaning that it comes into existence routinely over time (metric/performance data, log files, transactions, and so on), it is quite possible that Elastic ML's anomaly detection is all you'll ever need. As you'll see, it is incredibly flexible and easy to use and accomplishes many use cases on a broad variety of data. It's kind of a Swiss Army knife! A large amount of this book (Chapters 3 through 8) will be devoted to how to leverage anomaly detection (and the ancillary capability of forecasting) to get the most out of your time series data that is in the Elastic Stack.

If you are more interested in finding unusual entities within a population/cohort (User/Entity Behavior), you might have a tricky decision between using population analysis in anomaly detection versus outlier detection in data frame analytics. The primary factor may be whether or not you need to do this in near real time—in which case you might likely choose population analysis. If near real time is not necessary and/or if you require the consideration of multiple features simultaneously, you would choose outlier detection. See Chapter 10, for more detailed information about the comparison and benefits of each approach.

That leaves many other use cases that require a multivariate approach to modeling. This would not only align with the previous example of real estate pricing but also encompass the use cases of language detection, customer churn analysis, malware detection, and so on. These will fall squarely in the realm of the supervised ML of data frame analytics and be covered in Chapters 11 through 13.

In the next chapter, we will get down and dirty with understanding how to enable Elastic ML and how it works in an operational sense. Buckle up and enjoy the ride!

Left arrow icon Right arrow icon

Key benefits

  • Integrate machine learning with distributed search and analytics
  • Preprocess and analyze large volumes of search data effortlessly
  • Operationalize machine learning in a scalable, production-worthy way

Description

Elastic Stack, previously known as the ELK stack, is a log analysis solution that helps users ingest, process, and analyze search data effectively. With the addition of machine learning, a key commercial feature, the Elastic Stack makes this process even more efficient. This updated second edition of Machine Learning with the Elastic Stack provides a comprehensive overview of Elastic Stack's machine learning features for both time series data analysis as well as for classification, regression, and outlier detection. The book starts by explaining machine learning concepts in an intuitive way. You'll then perform time series analysis on different types of data, such as log files, network flows, application metrics, and financial data. As you progress through the chapters, you'll deploy machine learning within Elastic Stack for logging, security, and metrics. Finally, you'll discover how data frame analysis opens up a whole new set of use cases that machine learning can help you with. By the end of this Elastic Stack book, you'll have hands-on machine learning and Elastic Stack experience, along with the knowledge you need to incorporate machine learning in your distributed search and data analysis platform.

Who is this book for?

If you’re a data professional looking to gain insights into Elasticsearch data without having to rely on a machine learning specialist or custom development, then this Elastic Stack machine learning book is for you. You'll also find this book useful if you want to integrate machine learning with your observability, security, and analytics applications. Working knowledge of the Elastic Stack is needed to get the most out of this book.

What you will learn

  • Find out how to enable the ML commercial feature in the Elastic Stack
  • Understand how Elastic machine learning is used to detect different types of anomalies and make predictions
  • Apply effective anomaly detection to IT operations, security analytics, and other use cases
  • Utilize the results of Elastic ML in custom views, dashboards, and proactive alerting
  • Train and deploy supervised machine learning models for real-time inference
  • Discover various tips and tricks to get the most out of Elastic machine learning

Product Details

Country selected
Publication date, Length, Edition, Language, ISBN-13
Publication date : May 31, 2021
Length: 450 pages
Edition : 2nd
Language : English
ISBN-13 : 9781801078467
Vendor :
Elastic
Category :
Languages :

What do you get with eBook?

Product feature icon Instant access to your Digital eBook purchase
Product feature icon Download this book in EPUB and PDF formats
Product feature icon Access this title in our online reader with advanced features
Product feature icon DRM FREE - Read whenever, wherever and however you want
Product feature icon AI Assistant (beta) to help accelerate your learning

Product Details

Publication date : May 31, 2021
Length: 450 pages
Edition : 2nd
Language : English
ISBN-13 : 9781801078467
Vendor :
Elastic
Category :
Languages :

Packt Subscriptions

See our plans and pricing
Modal Close icon
$19.99 billed monthly
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Simple pricing, no contract
$199.99 billed annually
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Choose a DRM-free eBook or Video every month to keep
Feature tick icon PLUS own as many other DRM-free eBooks or Videos as you like for just $5 each
Feature tick icon Exclusive print discounts
$279.99 billed in 18 months
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Choose a DRM-free eBook or Video every month to keep
Feature tick icon PLUS own as many other DRM-free eBooks or Videos as you like for just $5 each
Feature tick icon Exclusive print discounts

Frequently bought together


Stars icon
Total $ 158.97
Machine Learning with the Elastic Stack
$48.99
Threat Hunting with Elastic Stack
$54.99
Getting Started with Elastic Stack 8.0
$54.99
Total $ 158.97 Stars icon

Table of Contents

17 Chapters
Section 1 – Getting Started with Machine Learning with Elastic Stack Chevron down icon Chevron up icon
Chapter 1: Machine Learning for IT Chevron down icon Chevron up icon
Chapter 2: Enabling and Operationalization Chevron down icon Chevron up icon
Section 2 – Time Series Analysis – Anomaly Detection and Forecasting Chevron down icon Chevron up icon
Chapter 3: Anomaly Detection Chevron down icon Chevron up icon
Chapter 4: Forecasting Chevron down icon Chevron up icon
Chapter 5: Interpreting Results Chevron down icon Chevron up icon
Chapter 6: Alerting on ML Analysis Chevron down icon Chevron up icon
Chapter 7: AIOps and Root Cause Analysis Chevron down icon Chevron up icon
Chapter 8: Anomaly Detection in Other Elastic Stack Apps Chevron down icon Chevron up icon
Section 3 – Data Frame Analysis Chevron down icon Chevron up icon
Chapter 9: Introducing Data Frame Analytics Chevron down icon Chevron up icon
Chapter 10: Outlier Detection Chevron down icon Chevron up icon
Chapter 11: Classification Analysis Chevron down icon Chevron up icon
Chapter 12: Regression Chevron down icon Chevron up icon
Chapter 13: Inference Chevron down icon Chevron up icon
Other Books You May Enjoy Chevron down icon Chevron up icon

Customer reviews

Most Recent
Rating distribution
Full star icon Full star icon Full star icon Full star icon Full star icon 5
(9 Ratings)
5 star 100%
4 star 0%
3 star 0%
2 star 0%
1 star 0%
Filter icon Filter
Most Recent

Filter reviews by




N/A Feb 06, 2024
Full star icon Full star icon Full star icon Full star icon Full star icon 5
Achat facile, livraison très rapide, paiement sécurisé via paypal, parfait
Feefo Verified review Feefo
Amruta Ghate Aug 25, 2021
Full star icon Full star icon Full star icon Full star icon Full star icon 5
Rich's book goes into intricate detail on all the features and functions of Elastic's ML capabilities. If you want to learn not only how to configure jobs but understand the underlying models and settings this is the book for you. I use Elastic daily and learned a lot by reading it.
Amazon Verified review Amazon
A. Norris Aug 24, 2021
Full star icon Full star icon Full star icon Full star icon Full star icon 5
This product is an asset in the library. It is valuable as a lookup resource for both new and professional users of Elastic machine learning. The second edition adds more information about the latest revisions in the product and this is useful if you want to employ the most effective solutions.
Amazon Verified review Amazon
Charlie W Jul 27, 2021
Full star icon Full star icon Full star icon Full star icon Full star icon 5
I work at a large pharma company and we heavily use Elastic. My team needs to better utilize ML and this book helped me to quickly get up to speed!! I went from simply spelling it to now being able to teach others about structured vs. unstructured learning as well as the different methods. I think it will take some experience to better configure jobs (ie. granularity of data), but now I have the knowledge to try it out. Great book! I recommend it.
Amazon Verified review Amazon
Peter Titov Jul 20, 2021
Full star icon Full star icon Full star icon Full star icon Full star icon 5
While I am by no means a machine learning expert; this book has provided exceptionally thoughtful insights and a methodical approach to applying Machine Learning within the Elastic Stack to set myself up for success (and you can too) with the practical applications of ML across a variety of data sources.I would highly recommend this book!
Amazon Verified review Amazon
Get free access to Packt library with over 7500+ books and video courses for 7 days!
Start Free Trial

FAQs

How do I buy and download an eBook? Chevron down icon Chevron up icon

Where there is an eBook version of a title available, you can buy it from the book details for that title. Add either the standalone eBook or the eBook and print book bundle to your shopping cart. Your eBook will show in your cart as a product on its own. After completing checkout and payment in the normal way, you will receive your receipt on the screen containing a link to a personalised PDF download file. This link will remain active for 30 days. You can download backup copies of the file by logging in to your account at any time.

If you already have Adobe reader installed, then clicking on the link will download and open the PDF file directly. If you don't, then save the PDF file on your machine and download the Reader to view it.

Please Note: Packt eBooks are non-returnable and non-refundable.

Packt eBook and Licensing When you buy an eBook from Packt Publishing, completing your purchase means you accept the terms of our licence agreement. Please read the full text of the agreement. In it we have tried to balance the need for the ebook to be usable for you the reader with our needs to protect the rights of us as Publishers and of our authors. In summary, the agreement says:

  • You may make copies of your eBook for your own use onto any machine
  • You may not pass copies of the eBook on to anyone else
How can I make a purchase on your website? Chevron down icon Chevron up icon

If you want to purchase a video course, eBook or Bundle (Print+eBook) please follow below steps:

  1. Register on our website using your email address and the password.
  2. Search for the title by name or ISBN using the search option.
  3. Select the title you want to purchase.
  4. Choose the format you wish to purchase the title in; if you order the Print Book, you get a free eBook copy of the same title. 
  5. Proceed with the checkout process (payment to be made using Credit Card, Debit Cart, or PayPal)
Where can I access support around an eBook? Chevron down icon Chevron up icon
  • If you experience a problem with using or installing Adobe Reader, the contact Adobe directly.
  • To view the errata for the book, see www.packtpub.com/support and view the pages for the title you have.
  • To view your account details or to download a new copy of the book go to www.packtpub.com/account
  • To contact us directly if a problem is not resolved, use www.packtpub.com/contact-us
What eBook formats do Packt support? Chevron down icon Chevron up icon

Our eBooks are currently available in a variety of formats such as PDF and ePubs. In the future, this may well change with trends and development in technology, but please note that our PDFs are not Adobe eBook Reader format, which has greater restrictions on security.

You will need to use Adobe Reader v9 or later in order to read Packt's PDF eBooks.

What are the benefits of eBooks? Chevron down icon Chevron up icon
  • You can get the information you need immediately
  • You can easily take them with you on a laptop
  • You can download them an unlimited number of times
  • You can print them out
  • They are copy-paste enabled
  • They are searchable
  • There is no password protection
  • They are lower price than print
  • They save resources and space
What is an eBook? Chevron down icon Chevron up icon

Packt eBooks are a complete electronic version of the print edition, available in PDF and ePub formats. Every piece of content down to the page numbering is the same. Because we save the costs of printing and shipping the book to you, we are able to offer eBooks at a lower cost than print editions.

When you have purchased an eBook, simply login to your account and click on the link in Your Download Area. We recommend you saving the file to your hard drive before opening it.

For optimal viewing of our eBooks, we recommend you download and install the free Adobe Reader version 9.