How a Computer Works (2015)
A scanner is a device that allows you to make a copy of a document and view/save the document in the computer. Many scanners are available, their price relates to the size and resolution that they can scan a document into the computer at.
More expensive scanners have a higher resolution (more pixels) that produces a higher quality scanned image.
The scanner incorporates a charge-coupled device (CCD). Inside the CCD is an array of photodiodes. When light falls on them a low voltage is produced that is sent to an analogue to digital converter (ADC).
If the image being scanned is just black and white i.e. just two colours then two voltages are produced by the photodiodes.
With a flatbed scanner the document is placed face down onto a glass panel. The scanning processes combines shining a bright light over the documents surface.
The reflections the CCD picks up differ depending on what the light hits. Less light is reflected back off white spaces on the document.
If a colour document is being scanned red, green or blue filters are positioned in front of the diodes.
The document to be scanned is placed face down onto a glass sheet. The scanning mechanism sits below the glass sheet. As the document is scanned a motor moves the scan head beneath the page.
A light source illuminates the document in the area where the scan head is positioned and this light is reflected onto the scan head through a series of mirrors.
The scan head lens focuses the light beams onto arrays of light sensitive photodiodes.
The photodiodes convert the amount of light into an electrical current. The electrical current varies with the amount of light picked up by the photodiodes.
The photodiodes connect to an analogue to digital converter (ADC) that converts the electrical current into data bits that represent a pixel.
The data bits are sent to the computer where the data is stored ready to be opened up in a graphics program.
Optical character recognition (OCR) is a technology that allows text-based documents to be converted into text files. This allows the document to be edited in a word processor.
When the document is scanned, the scanner converts the dark printing i.e. the text into a bitmap. The bitmap is a square containing pixels. These pixels are either black or white.
The black pixels represent a copy of the character from the scanned document. As each pixel is larger than the details of text this causes the edges of the scanned font to blur.
This causes most of the problems for OCR systems.
As the document is scanned in, the OCR software interprets the bitmap produced by the scanner and averages out the zones of black and white pixels. This is to map the white space on the document.
This allows the software to block off paragraphs, columns, headlines and any graphics, which may be in the document.
The white space between lines is the baseline for recognizing text characters. The software's first pass tries to match each bitmap character by a pixel-by-pixel comparison to character templates that the program has in memory.
These templates include the fonts -numbers, punctuation and extended characters of some fonts.
To obtain a perfect match the document needs to use a font which is common to the OCR software. The document itself needs to be of high quality and blur free for accurate results.
The unrecognized characters go through a further process called feature extraction.
This time consuming process calculates the characters x height, this is the height of the fonts lowercase x.
From here it analyses each character's combination of straight lines, curves and the hollow areas within loops as in q known as bowls.
The software builds a new alphabet of the font being used which enables its recognition speed to increase.
If these two processes don't recognise the character then the text character is replaced by distinctive character such as # @. From here you must manually change these symbols back to the correct character yourself. Other OCR software displays a bitmap and asks you to convert the character yourself.
The OCR software will give you the option of saving the document as an ASCII file. With this you can open the document in a word processor.