Document layout analysis
Imagine a scenario where you are tasked with extracting the values for the various keys present in a passport (like name, date of birth, issue date, and expiry date). In certain passports, values are present below the keys; in others, they are present on the right side of keys, while others have them on the left side. How do we build a single model that is able to assign a value corresponding to each text within the document image? LayoutLM comes in handy in such a scenario.
Understanding LayoutLM
LayoutLM is a pre-trained model that is trained on a huge corpus of document images. The architecture of LayoutLM is as follows:
Figure 15.14: LayoutLM architecture (source: https://arxiv.org/pdf/1912.13318)
As shown in the preceding diagram, the corresponding workflow consists of the following steps:
- We take an image of the document and extract the various words and their bounding-box coordinates (x0, x1, y0, and y1) – this...