Loading data in Apache Solr
Once Apache Solr is configured, the next step is to load data in Apache Solr and run queries. There are different ways to load data into Apache Solr. The following diagram depicts most of the used ones:
We have already seen the simple post tool earlier while setting up Apache Solr. We are going to understand Extracting Request Handler.
Extracting request handler – Solr Cell
Solr Cell is one of the most powerful handlers for uploading any type of data. This is particularly useful if you wish to run Solr on a set of files/unstructured data containing different formats such as office, pdf, eBook, emails, and text. In Apache Tika, text extraction is based purely on file type and content. So, if you have a PDF of scanned images containing text, Apache Tika won't be able to extract any of the text from it. In such cases, you need to use OCR-based software to bring in such functionality for Solr. You can simply try this by downloading the curl utility and then...