Summary
In this chapter, we discussed how to build a data capture stage for the IDP pipeline. Data is your gold mine, and you need a secure, scalable, and reliable data store. We introduced Amazon S3 and how you can leverage an object store to aggregate and store data in a scalable and highly available manner. We also described the data capture stage, with documents of varying layouts, formats, and types.
We then reviewed the need for document classification and categorization, with examples including mortgage processing and insurance claims processing. We discussed Amazon Comprehend and its custom classification feature. This chapter also gave you a hands-on experience in how to classify documents as invoice and receipt types. Moreover, we also looked at Amazon Rekognition, and how we can use its Custom Label feature to classify documents on its structural formats. You also had hands-on experience in classifying documents with the presence of the AWS logo or a non-AWS logo.
...