Blog

Adobe Acrobat Gotcha – Searchable Image versus Searchable Image (Exact)

Skewing Problem When I Run OCR with Acrobat XI

Here’s an Adobe Acrobat XI “gotcha” for my attorney friends out there.

As a solo practitioner, I use a Fujitsu ScanSnap ix500 scanner to scan paper documents to PDF format.  I use Adobe Acrobat XI to then run “optical character recognition” (OCR) on these PDF files so that the files are word-searchable.  I like the hardware and the software I use.  Both seem quite adequate for my purposes.

I discovered an odd quirk of the Acrobat XI OCR feature that could trip you up if you are not careful.

To run OCR on a PDF file you have opened in Acrobat XI, you click on Tools > Text Recognition > In This File, like so:

 

text recognition in this file menu

 

Then you get the “Recognize Text” screen, which looks like this:

 

recognize text screen

 

Then you can click on, “Edit…”, to see the output options…

 

output options for ocr

 

As you can see, when you run Acrobat XI’s text recognition (OCR) feature on a PDF document, there are three choices for the PDF “output style”:

  • Searchable Image
  • Searchable Image (Exact)
  • ClearScan

I noticed that when I run the OCR with the default setting, “Searchable Image”, the software tries to “deskew” the image by rotating the image so that some of the text is right side up and squared-up with the page orientation.   When it does this, however, text and images near the edges and corners of the scanned piece of paper are “cut off”.  Here is an image showing that effect:

 

Scanned image deskewed

 

In this case, the postmark date, the most important information on this sheet of paper, is what is being cut off.

Not only is this quirk annoying, if you hadn’t noticed it when scanning, you could very well try to use this document at a later date thinking that all of the text and images on this sheet of paper are visible in the PDF.  Gotcha!  When would you discover that?  Generally when you are assembling trial notebooks and the original, paper document has long since been shredded.  Double gotcha!

I noticed that if I ran the OCR using the PDF output style of “Searchable Image (Exact)”, there was no deskewing of the image.  Acrobat just rotated the image to a landscape orientation, ran the OCR, and left the page exactly as it appears in the original hard copy document.  Here is what the scan looks like with the “Searchable Image (Exact)” setting selected:

 

Searchable Image exact selected

 

Very nice.  The page is word-searchable but the text and images at the top and upper right corner are not cut off.

I also noticed that there is a third setting available called, “ClearScan”.  Wondering what that did, I also ran the OCR using the ClearScan setting.  The deskewing problem reappeared as shown in this image:

 

ocr with clearscan adobe acrobat

 

A quick web search reveals that “ClearScan” replaces the fonts in the OCR’ed image with custom fonts based on what Acrobat sees in the scanned image.  I am not sure how that helps me.  I would rather be able to see all of the text and images in the original, un-deskewed original.

At any rate, the version of Adobe Acrobat I am using is Acrobat Standard version XI.  That is a couple of generations out of date as the newest version of Acrobat is version “DC” which came out in 2015.  I do not know what OCR features and settings are available in Acrobat DC.  I guess I will find out as Adobe will cease supporting Acrobat XI on October 15, 2017.

print

Tags

 

Leave a Reply

Your email address will not be published. Required fields are marked *

*

 
 
 

Our Location

Law Office of Brock R. Wood, LLC

3570 E. 12th Avenue Denver, CO 80206 Telephone: (303) 618-4569 Fax: (720) 240-0728