|
|
QUESTION: What is OCR?
ANSWER: OCR stands for
"Optical Character Recognition."
It is a computerized process that enables you to convert a paper
document into a
computer file that you can search and manipulate using a
word processor.
An OCR system reads text from paper,
translates the images of letters, numbers, punctuation marks, etc.
into a text-based form, and creates a computer file that contains
the translated information. The computer file that gets created
contains fonts and ASCII codes.
All OCR systems include a machine called a "scanner."
This is a device with a clear glass surface on it and
a camera inside it. You put a document face-down on
the glass and the camera inside the scanner takes
a picture of the document and stores that picture in the
form of a bitmap file
(also known as an "image file").
Then, the OCR software in your computer
uses its
intelligence to examine
the patterns of dots in the image file and creates a file
that contains text that is represented as fonts and ASCII codes.
With most OCR systems, the image file that
is created by the scanner is discarded after the final file
(the file containing the fonts and ASCII codes) has been created.
OCR software
Citation Software offers OCR software called
"ExpressRecognition Server." It is server-based OCR software that
automatically converts images, paper documents, and faxes
into editable and searchable electronic files such as
PDF files or text files.
ExpressRecognition Server can perform OCR on specific regions
of a page (a process known as "Zonal OCR") and extract
the recognized data into XML files for further analysis.
Click here
to get more information about ExpressRecognition Server.
|
|
|