Available similarity models
As already mentioned, the original and default similarity model available before Apache Lucene 6.0 was the TF-IDF model but in Lucene 6.0 it is changed to BM25, which we have already discussed in detail in The changed default text scoring in Lucene: BM25 section in Chapter 2, The Improved Query DSL.
Apart from BM25, other similarity models that we can use are:
TF-IDF (classic): This similarity model is based on TF-IDF model and used to be the default similarity model before Elasticsearch 5.0. In order to use this similarity in Elasticsearch, you need to use the
classic
name.Divergence from randomness (DFR): This similarity model is based on the probabilistic model of the same name. In order to use this similarity in Elasticsearch, you need to use the
DFR
name. It is said that the divergence from the randomness similarity model performs well on text similar to natural language text.
Divergence from independence (DFI): This similarity model is based on the probabilistic...