The Document Understanding offering from UiPath can be used to extract data from a wide range of document types.
The major phases of this solution are as follows:
- Load Taxonomy: This is the form and fields hierarchy and will be used for classification and extraction.
- Digitize: Uses different OCR engines to digitize the document into a machine-readable format.
- Classify: Classifies document types; for example, claims, invoices, and receipts.Â
- Extract: Extracts data from forms, such as name and date of birth.
- Validate: Validates and corrects the extracted data against the data in the document.
- Export: Exports the data as an output file; for example, Excel.
There are different OCR engines that are shipped with this package that can be used to digitize the documents in different formats, such as PDF, TIFF, JPEG, and so on. Classifying and extracting the content is done with position-based form extractors.
ML extraction is also available...