Optimizing a scanned document
Optimizing scanned pages means finding a balance between image quality and file size. Text-heavy pages contain much less bitmap information than color pages with photos and charts.
Important information
The process of creating PDFs discussed here should only be used when no digital version of the publication exists in a file format that could be edited in an authoring application, such as Microsoft Word or InDesign. We assume you work with paper copies with no trace of the electronic files that created them.
Acrobat performs multiple tasks all at once when we scan a page. Selections made before scanning discussed earlier can deliver great results. This is a good time to look at the file size of the finished, optimized PDF and compare it with our initial file, before any enhancement, including optimization, was done.
Use the menu to open the File | Properties… options, then click on the Description tab at the top of the dialog box.
The bottom-left area of the dialog box will display the file size. Our sample file initially had 1.33 MB of data. All those color bitmaps added up to quite a size. And this was only one page. Imagine a file size of 200 pages. You can do the math…
Figure 2.13 – File size before optimization
On the other hand, the completed, functional, and optimized file had 61.60 KB – just a fraction of the initial file size, yet much easier to read on screen and much more functional with a layer of live text that can be searched. If you emailed me a copy, I would rather open the optimized file.
Figure 2.14 – File size after optimization
In review, how did we get here? We selected the Scan & OCR | Enhance | Camera Image | Whiteboard options, where Acrobat converted the background to white and the text to black.
In documents that have a mix of text and photos, it is helpful to know that a default setting for the Enhance Scanned PDF | Optimization Option options is Apply Adaptive Compression. This algorithm divides each page into black-and-white, grayscale, and color regions and chooses a representation that preserves the appearance yet highly compresses each type of content. Scanning resolutions at 300 dpi for color and grayscale and 600 dpi for black-and-white content and/or for pages with very small font sizes provide the best image quality-to-file size balance.