This blog is part one of a comprehensive guide to Optical Character Recognition (OCR). We discuss popular open-source tools, Tesseract & EasyOCR, with hands-on tutorials on how to use the tools effectively.
Optical character recognition or OCR is not a new topic in the field of document understanding, OCR is a technique (both electronic and mechanical) to transfer image un-editable text data to machine-encoded editable text (i.e., a "string" data type). We usually associate OCR with software. in other words, these are methods that:
Skilled practitioner flow of OCR recognization
The task is to convert image text data to machine-readable text using OCR engines. However, since the 1960s, when image interpretation and computer vision were first developed. researchers have struggled to develop generalized OCR systems that work in cases of broad and vague use.
For example, if I had to show the following image to my OCR engine, I would expect it to sense the text, recognize the text, and then encrypt the text as editable string data.
Output => CODITATION
However, despite its simplicity, OCR is exceptionally hard. Although the discipline of computer vision has been around for more than 50 years (with mechanical OCR machines dating back over 100 years), we have yet to "solve" OCR and create an off-the-shelf OCR system that works in almost any situation.
There are too many factors to think about., such as noise, writing style, image quality, etc. We're still a long way from resolving OCR. There are so many complexities in how humans share information through writing. As a result, we assert that systems for computer vision will never be able to read image text with 100% reliability
This blog would not exist if OCR had already been rectified. Your 1st Google search would have directed you to the program code you needed to apply OCR convincingly and correctly to your tasks. However, that is not the world we reside in. While we're getting better at tackling OCR challenges, knowing how to apply the present OCR engine, nevertheless requires a skilled practitioner.
Tesseract, which was created by Hewlett Packard in the 1980s, was made open-source in 2005. Google eventually endorse the endeavor in 2006 and has served as a supporter since at. Tesseract software supports a wide range of natural languages, from English (at first) to Punjabi to Yiddish. Since the updates in 2015, it now supports over 100 written languages and has code in place so that it can easily be trained in other languages as well. Originally a C program, it was ported to C++ in 1998. The software is headless and can only be run from the command line. It does not include a graphical user interface (GUI), but various other software packages wrap Tesseract to offer one.
Tesseract is particularly fit for document processing piping systems in which images are scanned & pre-processed, and afterward, Optical Character Recognition is used.
EasyOCR, as the name implies, is a Python package that enables computer vision programmers to accomplish Optical Character Recognition with ease.
The EasyOCR package is created and maintained by Jaided AI, a company that specializes in Optical Character Recognition services. Python and the PyTorch library are used to implement EasyOCR. When you have a CUDA-capable GPU, the inherent PyTorch deep learning library can drastically improve text detection and OCR speed. EasyOCR can currently OCR text in 58 languages, including English, German, Hindi, Russian, and others. The EasyOCR developers intend to add more languages in the coming years.EasyOCR currently only supports OCRing typed text. They also intend to release a handwriting identification system later in 2020!
You can improve OCR accuracy by preprocessing your images with computer vision and image processing libraries like OpenCV and scikit-image. however, the question is what algorithms and techniques do you employ? Deep learning is willing to take responsibility for near-perfect accuracy in almost every field of computer science. For OCR, which deep learning models, layer types, and loss functions do you use?
Utilizing Tesseract options and configurations to improve OCR accuracy We are using machine learning to denoise our images to improve OCR accuracy. Tesseract performs different image processing operations internally (via the Leptonica library) before performing OCR. It usually does a fine job of this, but there will undoubtedly be cases where it falls short, resulting in a significant decrease in accuracy. However, image pre-processing techniques such as Rescaling, Binarisation, Noise Removal, Dilation or Erosion, Rotation or Deskewing, Borders, and Transparency or Alpha channel enhance OCR final inferences. In the case of complex images yielding no results, Tries to OCR the text but fails miserably, returning illogical results. I was annoyed when I couldn't get the correct OCR result. I had no thought about when and how to utilize various options. I had no idea how half of the options were managed because the documentation was so thin and lacked actual examples!
The lesson I learned, and perhaps one of the most common issues I see new OCR solving problems and making now, is failing to understand fully how Tesseract's page segmentation modes can strongly impact the correctness of your OCR output.
When operating with the Tesseract OCR engine, you must become acquainted with Tesseract's PSMs; without them, you will easily become upset and will be unable to achieve high OCR accuracy.
Simply supply the —help-psm argument to tesseract to get a list of the 14 PSMs. Moreover, skilled practitioners can play with the option of Tesseract Page Segmentation options as per input data. To see the detail of the tesseract PSM option - $ tesseract –help-psm
Figure 1: PSM option detail descriptions
Let's play with the input type and the PSM options.
CASE 1: we just want to verify the direction of text present in the input image for the below image
Figure 2: Just need orientation of text
It is pretty simple using tesseract PSM option 0, and the execution command is $ tesseract <image path> stdout --psm 0
Figure 3 Output of PSM 0 option
You can see the orientation of input is 0 degrees [maybe the degree 90, 180, 270 based on input], and also returns the script's confidence. (i.e., graphics signs/writing system), such as Latin, Han, Cyrillic, etc.
Figure 4: Just need the orientation of the text
$ tesseract aboutcoditation_rotated.png stdout –psm 0
Figure 5: Just need the orientation of the text
You can see the orientation for Figure 4 is in the output window of Figure 5 is 270 degrees and if you want to correct the visibility just rotate by 90 degrees in the reverse direction which also given in the output as a rotate option. However you may be confused about where is OCR text, --psm 0 mode does not perform OCR, just gives Orientation and script detection (OSD). In short, If you only need the info on the text, —psm 0 is the mode to use. Let's move toward the title of the blog.
CASE 2: Desire is text in the image of Figure 2, and it's not possible with PSM 0 then is any choice, yes there is the next number is 1 - $ tesseract aboutcoditation_rotated stdout --psm 1
Figure 6: OCR text of Figure 2
Awesome, you have taken baby steps toward the OCR engine, however, if you see the output there is no OSD information. Now let's take another new step.
CASE 3: OCR default PSM is 3, so if I use that one for Figure 2 will it give me some improvement? The answer is yes. So skilled practitioners suppose to start with option PSM 3. Now take the simplest one.
CASE 4: Single digit number depicted in Figure 6. As we said, start with the default option –psm 3 whereas the result is unfortunately Empty! So need to experiment with other options and if you test with PSM 6, 7, 8, 9, 10, and 13 gives the expected text. However, you better go with option PSM 10 only as per the remark of PSM 10: Image as a Single Character
Figure 7: One-digit number
$ tesseract 4.png stdout --psm 6
$ tesseract 4.png stdout --psm 7
$ tesseract 4.png stdout --psm 8
$ tesseract 4.png stdout --psm 9
$ tesseract 4.png stdout --psm 10
$ tesseract 4.png stdout --psm 13
$ tesseract 4.png stdout --psm 3
Figure 8: Result for Figure 7
These use cases will be discussed in more detail in my next blog. Stay Tuned!
In this blog, we explore the intricate challenges faced during mobile app testing and pragmatic strategies to surmount them. We delve into each aspect that complicates mobile app testing from device and OS diversity to security concerns and user experience optimization.
Discover how to use pprof, Go's profiling tool, to tackle blocking issues in Go applications. This guide outlines steps to pinpoint and remedy performance bottlenecks stemming from goroutine synchronization and shared memory access.
Learn how to use Nginx to host both backend services and Single Page Applications (SPAs) on a single server. This guide covers the setup of Nginx configuration files, utilizing the sites-available and sites-enabled directories for better organization, and managing server configurations for different domains.