Understanding the challenges in legacy document extraction
Many organizations, across industries, irrespective of the size of the business, deal with a large number of documents in everyday transactions. Moreover, we discussed the data diversity, data sources, and various layouts and formats for these documents in Chapter 2, Document Capture and Categorization. The data diversity at scale makes it difficult to extract elements from these documents. For example, think about a back-office task for a company. This is one of the non-mission critical tasks for a company, but at the same time, these tasks need to be fulfilled in a scheduled and timely manner. For example, the back office receives invoices at scale and needs to extract information and put it in a structured way in its enterprise resource planning (ERP) system, such as Systems, Applications, and Products (SAP) for accurate payments. Once we convert the data from unstructured documents to a structured format, a machine can...