In this chapter, we will look into Elasticsearch performance-related issues and how we can tweak Elasticsearch to get the maximum output. Elasticsearch is widely used to search through a database and return documents that match the query, but it can quickly become overwhelmed when it has to retrieve a large number of documents using a single query. The Scroll API is therefore recommended in these situations. Elasticsearch does not index documents larger than 100 MB, but this setting can be changed in http.max_content_length as long as it does not go over the Lucene limit of 2 GB. Generally, users are recommended to avoid using large documents in Elasticsearch, as they put stress on the network and overwhelm memory usage and disk space.
In this chapter, we are going to cover the following topics:
- Data sparsity
- Solutions to common problems
- How to tune for indexing...