In previous versions of Elasticsearch, the sparsity of documents was to be avoided because of Lucene's structure. This structure identifies documents internally with document IDs, which are then used for communication between the internal APIs of Lucene. Lucene retrieves values of the norm from the document ID, generated by a search query, by reading the byte at the index of the document ID.
This is, at the same time, both very efficient and time-intensive, because Lucene can quickly access the norm values and the documents that have no value and use one byte of storage for each. This means, though, that if an index has x documents, the norms require x bytes of storage per field. This not only affects the sparsity requirements, but also the indexing...