Chapter 1. Understanding Apache Solr
The world of information technology revolves around transforming data into information that we can understand. This data is generated every now and then, from various sources, in various forms. To analyze such data, engineers must observe data characteristics, such as the velocity with which the data is generated, volume of data, veracity of data, and data variety. These four dimensions are widely used to recognize whether the data falls into the category of Big Data. In an enterprise, the data may come from its operations network which would involve plant assets, or it may even come from an employee who is updating his information on the employee portal. The sources for such data can be unlimited, and so is the format. To address the need for storage and retrieval of data of a non-relational form, mechanisms such as NOSQL (Not only SQL) are widely used, and they are gaining popularity.
The mechanism of NOSQL does not provide any standardized way of accessing the data unlike SQL in the case of relational databases. This is because of the unstructured data form that exists within NOSQL storage. Some NOSQL implementations provide SQL-like querying, whereas some provide key-value based storage and data access. It does not really address the problem of data retrieval. Apache Solr uses the key-value-based storage as a means to search through text in a more scalable and efficient way. Apache Solr enables enterprise applications to store and access this data in an effective and efficient manner.
In this chapter, we will be trying to understand Apache Solr and we will go through the following topics:
- Challenges in enterprise search
- Understanding Apache Solr
- Features of Apache Solr
- Apache Solr architecture
- Apache Solr case studies