Intelligent Document Processing with AWS AI and ML
It was a Wednesday evening – I was busy collecting all my receipts and filling out my insurance claim document. I wanted my health insurance to provide reimbursement for the COVID-19 test kits that I had purchased. The next day, I went to the post office to send the documents through postal mail to my insurance provider. This made me think how we are still working with physical documents in the 21st century. With my approximate math, this month alone, we will use 650 million documents per month, considering that 2% of the entire US population buys a test kit and applies for reimbursement using a paper-based application. This is a ton of documents in this instance. In addition to physical copies, we may have tons of documents that might just be scanned documents – we are looking at manual processing for these documents too. Can we do any better in the 21st century to automate the processing of these documents?
Besides this particular instance, we use documents for many other use cases across industries, such as claims processing in the insurance industry, loan, and mortgage documents in the financial industry, and legal and contract documents. If you have bought a house or refinanced a house, you will already be aware of the number of documents that you need to use for loan processing. IDC predicts worldwide data to exceed 175 zettabytes by 2025. The volume of data is huge. On top of the volume of data, we are talking about data of different formats and unstructured – some are forms, as with insurance claims, and some can be dense text, as with legal contractual documents. The volume and varying formats of documents make manual processing time-consuming, error-prone, and expensive. According to IDC, there is a 23% growth in data every year. The immense scale and format of documents make it a challenge to process them. Moreover, the legacy or traditional document extraction technologies can work well for pristine documents, but when document quality varies, the performance of those early-generation systems frequently does not meet customer needs. Manual document extraction carried out by a human workforce introduces variability into the process since people make mistakes and double-checking all work is not cost-effective. The most important of these factors is the ability to get the key information from the documents into your decision-making systems to make high-quality decisions more quickly and based on accurate information. Hence, we are all looking for efficient, less time-consuming, cost-effective ways to process our documents for better insights.
In this introductory chapter, we will be establishing the basic context to familiarize you with some of the underlying concepts of document processing, the challenges in document processing, and how AWS Artificial Intelligence (AI)/Machine Learning (ML) services can help solve these problems.
We will be covering the following topics in this chapter:
- Understanding common document processing use cases across industries
- Understanding the AWS ML and AI stack
- Introducing Intelligent Document Processing pipeline