An Python script used to pre-process images of individual handwritten characters to increase OCR/ICR accuracy Part of the Open ICR Project - http://opensource.newmediaist.com/open-source-icr.html
The purpose of this image pre-processor is to "sanitize and standardize" the input image as much as possible to prepare it for the recognition engine. The image preprocessor has the following dependencies:
- Remove borders around the character (i.e. from imperfect character extraction)
- Median filtering is applied to remove salt and pepper type noise
- Character image is cropped down to borders of written character
- Character image is scaled to a standard set of dimensions
- Character image is thinned using Zhang Suen algo
- White space padding added around the image to prepare for next stage
- Erosion is added to the character image to join small gaps
python preprocessor.py -o original.png-d ~path_for_output\filename.png
Code licensed under Apache License v2.0