While we are always dreaming for the end game of digitization through AI in finance, the reality is that there is data that's trapped. And very often, these expenses come in the form of paper, not API feeds. Dealing with paper would be inevitable if we were to transform ourselves into a fully digital world where all our information is stored in JSON files or SQL databases. We cannot avoid handling existing paper-based information. Using an example of a paper-based document dataset, we are going to demonstrate how to build up the engine for the invoice entity extraction model.
In this example, we will assume you are developing your own engine to scan and transform the invoice into a structured data format. However, due to a lack of data, you will need to parse the Patent images dataset, which isavailableat http://machinelearning.inginf.units.it/data-and-tools/ghega-dataset. Within the dataset, there are images...