Introducing the PDF batch processing use case
To determine what the architecture will look like, you talk to your accounting department to understand the process for registering companies with the SEC. As per the process, the accounting department will generate PDF documents using the SEC's template for registration, also known as Form S20 (https://www.sec.gov/files/forms-20.pdf). The process also involves creating all the supporting documentation, along with the registration, which will be sent together to the SEC using an API call. LiveRight's Partner Integration team has the handshake with SEC in place, and they need the form data to be available in an Amazon DynamoDB (https://aws.amazon.com/dynamodb/) table that they will consume to create the message call to the SEC API.
However, before making the data available to the Partner Integration team, the accounting team mentioned that they need to review a collection of text lines that have been detected in the PDF document...