OCR integration
Most of the OCR (Optical Character Recognition) utilities available in the market will convert scanned archives into a PDF
format, including both image and text in the same standard container. Alfresco supports a content transformation framework—where you can plug in a third-party content transformation engine to convert a document from one format to another.
This gives you great flexibility when converting your image document, such as a TIFF
file, to a machine readable format such as PDF
, RTF
, or TXT
.
The following figure illustrates the process of scanning a paper document using a network scanner, and transferring the document, in an image format, into the Alfresco repository. Once the image document gets into the Alfresco repository, you can trigger a business rule, which converts it to a PDF document. You can still keep the image document in the repository for future reference. The quality and the accuracy of the output PDF document will be depending upon the OCR utility...