How to Implement Optical Character Recognition in Python
Any human can recognize and understand contents in any image that he sees through his eyes by reading it very easily. But computers aren’t developed in such a way to perceive the image as text, so they need a more organized and concrete way to make them understand as we learn.
Optical character recognition (OCR) is one of the major ways to make computers educate about reading the text out of images which has very wide applications in real-world like Number plates recognition for traffic control, scanning of documents and copying important information from it and etc. Python offers many great libraries to implement this OCR. in this blog, we would go through the implementation of OCR with python. As part of this, we will be discussing below topics
- What Exactly is OCR
- Optical character recognition applications
- Building a simple OCR model with python
- Pros and cons of OCR engines
What Exactly is OCR?
OCR is a process of detecting the text content from the images and translating them to understandable encoded text to the computer. Images first would be scanned and then analyzed for identification of the characters in it and later would be converted to machine-understandable encoded text.
Process of OCR:
- Bitmap conversion: Images are scanned, then all the text, graphics and other elements are converted to bitmaps
- Pre-processing: Contrast and brightness of images are adjusted in accordance to enchase the accuracy
- Splits: Images are split according to the zones of interest so that zones with text can be concentrated for the extraction
- Extraction: Further split the text image part into lines, words, and characters and feed those to the software so that it can match the characters applying various algorithms.
- Result: output would be the text read from the images. This output can also be converted into various mediums like PDF, word documents and as well as audios using text-to-speech tech.
This process of OCR mentioned would surely not fetch us cent percent accurate results, it would also need some corrections from the human end when input elements would not have scanned properly. Correction in errors can also be achieved using dictionaries and also with NLP too.
Applications of OCR:
With invent and advancement of OCR technologies manual effort of the document, digitalization is almost reduced by 80%.
Scenario 1: copying text from Scanned ID cards.
We would have many times faced this problem of doing manual effort in seeing and typing the required information like DL number, PAN number …etc while filling the documents but OCR would help us to get out of this manual effort with document scanner mobile like CAM SCANNER, Abode scan… etc.
Scenario 2: Passport recognition in airports
Many Airports using OCR technologies have reduced their manual efforts for the passport recognition process and as well as extracting much-needed information from them.
There are many more applications of OCR like Recognition of Number plates for traffic management, Automation of data entry or collection …etc.
Building a simple OCR model with python:
As part of this blog, we will build a simple OCR model to recognize and print the text from the image from our system, there are many other libraries like Textract for extracting data from pdf’s, pyocr for detection of sentences, and digits and also most popular OpenCV. But for this blog, we would use a library called PyTesseract (python Tesseract) which is from Google’s Tesseract-OCR Engine, which is an open-source tool and completely maintained by Google. Along with this, we would also use a library called pillow which is forked from PIL (Python Imaging Library) handles images of various formats and help us in opening and manipulating them.
Note: All the coding is done using Jupiter notebooks
Before moving on to coding first we are installing required packages (if already installed ignore above code). Now let us create a small function that takes an image that needs to be decoded to the text as input and return text as output.
In the first 5 lines of code, we are importing the classes that we need and then we are reading an image from the disk and then using pillow and tesseract function for OCR.
We have given below image as input:
And the output that we get is
“WELCOME TO THE WORLD OF OCR!! BRAVO OuR output is correct@#”
Pros and cons of Optical Character Recognition:
- Recognition of hand-written contents in Pdf
- Character recognition can be also restricted to particular parts of PDF
- Helps to store the recognized into Python objects which can be further used for doing any pre-processing for analysis of that text
- All the images for the recognition use the disk storage
- Any OCR cannot guarantee cent percent of accuracy, but still better performs on computer typed PDF documents
- OCR’s performance on the accuracy of hand-written texts would depend on various factors of it like page colors, handwriting and etc.