Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletter Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds
Arrow up icon
GO TO TOP
Elasticsearch 7.0 Cookbook

You're reading from   Elasticsearch 7.0 Cookbook Over 100 recipes for fast, scalable, and reliable search for your enterprise

Arrow left icon
Product type Paperback
Published in Apr 2019
Publisher Packt
ISBN-13 9781789956504
Length 724 pages
Edition 4th Edition
Languages
Arrow right icon
Author (1):
Arrow left icon
Alberto Paro Alberto Paro
Author Profile Icon Alberto Paro
Alberto Paro
Arrow right icon
View More author details
Toc

Table of Contents (19) Chapters Close

Preface 1. Getting Started FREE CHAPTER 2. Managing Mapping 3. Basic Operations 4. Exploring Search Capabilities 5. Text and Numeric Queries 6. Relationship and Geo Queries 7. Aggregations 8. Scripting in Elasticsearch 9. Managing Clusters 10. Backups and Restoring Data 11. User Interfaces 12. Using the Ingest Module 13. Java Integration 14. Scala Integration 15. Python Integration 16. Plugin Development 17. Big Data Integration 18. Another Book You May Enjoy

Setting up an ingestion node

The main goals of Elasticsearch are indexing, searching, and analytics, but it's often required to modify or enhance the documents before storing them in Elasticsearch.

The following are the most common scenarios in this case: 

  • Preprocessing the log string to extract meaningful data
  • Enriching the content of textual fields with Natural Language Processing (NLP) tools
  • Enriching the content using machine learning (ML) computed fields
  • Adding data modification or transformation during ingestion, such as the following:
    • Converting IP in geolocalization
    • Adding datetime fields at ingestion time
    • Building custom fields (via scripting) at ingestion time

Getting ready

You need a working Elasticsearch installation, as described in the Downloading and installing Elasticsearch recipe, as well as a simple text editor to change configuration files.

How to do it…

To set up an ingest node, you need to edit the config/elasticsearch.yml file and set up the ingest property to true, as follows:

node.ingest: true
Every time you change your elasticsearch.yml file, a node restart is required.

How it works…

The default configuration for Elasticsearch is to set the node as an ingest node (refer to Chapter 12, Using the Ingest module, for more information on the ingestion pipeline).

As the coordinator node, using the ingest node is a way to provide functionalities to Elasticsearch without suffering cluster safety.

If you want to prevent a node from being used for ingestion, you need to disable it with node.ingest: false. It's a best practice to disable this in the master and data nodes to prevent ingestion error issues and to protect the cluster. The coordinator node is the best candidate to be an ingest one.

If you are using NLP, attachment extraction (via, attachment ingest plugin), or logs ingestion, the best practice is to have a pool of coordinator nodes (no master, no data) with ingestion active.

The attachment and NLP plugins in the previous version of Elasticsearch were available in the standard data node or master node. These give a lot of problems to Elasticsearch due to the following reasons:

  • High CPU usage for NLP algorithms that saturates all CPU on the data node, giving bad indexing and searching performances
  • Instability due to the bad format of attachment and/or Apache Tika bugs (the library used for managing document extraction)
  • NLP or ML algorithms require a lot of CPU or stress the Java garbage collector, decreasing the performance of the node

The best practice is to have a pool of coordinator nodes with ingestion enabled to provide the best safety for the cluster and ingestion pipeline.

There's more…

Having known about the four kinds of Elasticsearch nodes, you can easily understand that a waterproof architecture designed to work with Elasticsearch should be similar to this one:

You have been reading a chapter from
Elasticsearch 7.0 Cookbook - Fourth Edition
Published in: Apr 2019
Publisher: Packt
ISBN-13: 9781789956504
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $19.99/month. Cancel anytime
Banner background image