Introduction
In this chapter, you will learn two very important recipes. The first recipe demonstrates how you can index your data, and the second recipe, which is very closely connected to the first recipe, demonstrates how you can search through your indexed data.
For both indexing and searching, we will be using Apache Lucene. Apache Lucene is a free, opensource Java software library used heavily for information retrieval. It is supported by the Apache Software Foundation and is released under the Apache Software License.
Many different modern search platforms, such as Apache Solr and ElasticSearch, or crawling platforms, such as Apache Nutch, use Apache Lucene in the backend for data indexing and searching. Therefore, any data scientist who learns those search platforms will benefit from the two basic recipes in this chapter.