Free Resources

Windows Logo  Mac Logo

Previous Page

Abbyy Fine Reader

Of the many advantages of using Optical Character Recognition (OCR) on these scanned pages, the size benefit is the most obvious. Storing information as text rather than individual pixels reduces the memory required by a staggering amount. For example, the finished HTM file for page 11, which is completely text, is 13K in size, while page 108, which is a line drawing, is 1,138 K. We couldn't have fit the cyclopedia with high resolution line drawings and photographs on a single CD, if we didn't use OCR. Other advantages of converting the text include enabling search, copy, and paste capability for reference.

Cyclopedia page in  Abbyy FIne Reader

While most OCR programs are designed to work with multi-page documents, breaking the 1150 pages of the first cyclopedia into 50 page chunks made sense to us for a number of reasons. It takes between two and a half and three hours to process a 50 page section, and that allows our operator to save a completed batch before starting on the next, and to take a break when his eyes start to lose focus.

Abbyy Fine Reader separates contiguous areas of the page into text and image boxes.

The operator compares the structure of each page to the original from the book.

Image box in Abbyy FIne Reader

Image boxes are realigned to include their captions.

The page is converted to HTML and each image is saved as an individual 600 dpi JPEG.

Next Page

Visit Our New Divisions:

RailDriver.com
TrainMaster Logo