Question answering using token classification
A QA problem is generally defined as an NLP problem with a given text and a question for AI, and getting an answer back. Usually, this answer can be found in the original text but there are different approaches to this problem. In the case of Visual Question Answering (VQA), the question is about a visual entity or visual concept rather than text but the question itself is in the form of text.
Some examples of VQA are as follows:
Figure 6.11 – VQA examples
Most of the models that are intended to be used in VQA are multimodal models that can understand the visual context along with the question and generate the answer properly. However, unimodal fully textual QA or just QA is based on textual context and textual questions with respective textual answers:
SQUAD is one of the most well-known datasets in the field of QA. To see examples of SQUAD and examine them, you can use the following code:
...