Boolean retrieval
Boolean retrieval deals with a retrieval system or algorithm where the IR query can be seen as a Boolean expression of terms using the operations AND
, OR
, and NOT
. A Boolean retrieval model is a model that sees the document as words and can apply query terms using Boolean expressions. A standard example is to consider Shakespeare's collected works. The query is to determine plays that contain the words "Brutus" and "Caesar," but not "Calpurnia." Such a query is feasible using the grep
command which is available on Unix-based systems.
It is an effective process when the document size is limited, but to process a large a document quickly, or the amount of data available on the web, and rank it on the basis of an occurrence count, is not possible.
The alternative is to index the document in advance for the terms. The approach is to create an incidence matrix, which records in a form of binary and marks whether the term is present in the given play or not:
Antony and Cleopatra... |