Answering questions from a document corpus in an extractive manner
For the use cases where we have a document corpus that contains a large number of documents, it’s not feasible to load the document content at runtime to answer a question. Such an approach would lead to long query times and would not be suitable for production-grade systems.
In this recipe, we will learn how to preprocess the documents and transform them into a form for faster reading, indexing, and retrieval that allows the system to extract the answer for a given question with short query times.
Getting ready
As part of this recipe, we will use the Haystack (https://haystack.deepset.ai/) framework to build a QA system that can answer questions from a document corpus. We will download a dataset based on Game of Thrones and index it. For our QA system to be performant, we will need to index the documents beforehand. Once the documents are indexed, answering a question follows a two-step process:
...