Using the ingest attachment plugin
It's easy to make a cluster non-responsive in Elasticsearch prior to 5.x, by using the attachment mapper. The metadata extraction from a document requires a very high CPU operation and if you are ingesting a lot of documents, your cluster is under-loaded.
To prevent this scenario, Elasticsearch introduced the ingest
node. An ingest
node can be held under very high pressure without causing problems to the rest of the Elasticsearch cluster.
The attachment processor allows us to use the document extraction capabilities of Tika in an ingest
node.
Getting ready
You need an up-and-running Elasticsearch installation, as we described in the Downloading and installing Elasticsearch recipe in Chapter 1, Getting Started.
To execute the commands, any HTTP client can be used, such as curl (https://curl.haxx.se/), Postman (https://www.getpostman.com/), or similar. Use the Kibana console, as it provides code completion and better character escaping...