Explore Products

Best Sellers

New Releases

Books

Videos

Audiobooks

Learning Hub

Conferences

Free Learning

You're reading from Advanced Elasticsearch 7.0 A practical guide to designing, indexing, and querying advanced distributed search engines

Product type Paperback

Published in Aug 2019

Publisher Packt

ISBN-13 9781789957754

Length 560 pages

Edition 1st Edition

Languages

Java

Tools

Elasticsearch

Concepts

Enterprise Search

Author (1):

Wai Tak Wong

View More author details

Table of Contents (25) Chapters

Preface

1. Section 1: Fundamentals and Core APIs FREE CHAPTER

2. Overview of Elasticsearch 7

3. Index APIs

4. Document APIs

5. Mapping APIs

6. Anatomy of an Analyzer

7. Search APIs

8. Section 2: Data Modeling, Aggregations Framework, Pipeline, and Data Analytics

9. Modeling Your Data in the Real World

10. Aggregation Frameworks

11. Preprocessing Documents in Ingest Pipelines

12. Using Elasticsearch for Exploratory Data Analysis

13. Section 3: Programming with the Elasticsearch Client

14. Elasticsearch from Java Programming

15. Elasticsearch from Python Programming

16. Section 4: Elastic Stack

17. Using Kibana, Logstash, and Beats

18. Working with Elasticsearch SQL

19. Working with Elasticsearch Analysis Plugins

20. Section 5: Advanced Features

21. Machine Learning with Elasticsearch

22. Spark and Elasticsearch for Real-Time Analytics

23. Building Analytics RESTful Services

24. Other Books You May Enjoy

Leave a review - let other readers know what you think

Building a Java Spark ML module for k-means anomaly detection

According to the Spark MLlib guide (see https://spark.apache.org/docs/latest/ml-guide.html), starting from Spark 2.0, the RDD-based APIs in the spark.mllib package will be retired. Users should use the DataFrame-based ML API in the spark.ml package. In this project, we import several classes from this new library to build the anomaly detection model. The following code block shows a few lines from the AnomalyDetection class in the com.example.esanalytics.spark.mllib package:

import org.apache.spark.ml.clustering.KMeansModel;
import org.apache.spark.ml.feature.VectorAssembler;
import org.apache.spark.ml.clustering.KMeans;

The following diagram helps us to learn about the steps to build the model within the scope of ES-Hadoop, Spark SQL, and Spark MLlib:

The step-by-step instructions are as follows:

There are two major...

The rest of the chapter is locked

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $19.99/month. Cancel anytime

Authors (1)

Wong

Dr. Timothy Wong is a 30 years IT veteran. He holds a PhD in networking from University of Manchester Institute of Science and Technology, UK. He has deep experience in wireless and wireline networking, digital a/v, software development, sales and consulting. He frequently works with telcos, banks, governments and enterprises. He is also a professor on Big Data, Wireless and IoT at Humber College, Toronto, Canada. As an entrepreneur, he co-founded a number of companies in the past 20 years. He infuses business and technical knowledge to pursue successes.

See other products by Wong