Information Retrieval (IR) deals with finding information in unstructured data. Any data that has no specific or generalized structure is unstructured data, and processing such data poses a great challenge to machines. Some examples of unstructured data are text files, doc files, XML files, and so on available on local PC or web. So, processing such large amount of unstructured data and finding the relevant information is a challenging task.
We will cover the following topics in this chapter:
- Boolean retrieval
- Dictionaries and tolerant retrieval
- Vector space model
- Scoring and term weighting
- Inverse document frequency
- TF-IDF weighting
- Evaluation of information retrieval systems