How to operate OCR engines

How to operate OCR engines - II

In our previous blog, we covered the basics of OCR, popular open-source tools; Tesseract & EasyOCR, with hands-on tutorials on how to use the tools effectively. In this blog, we all talk about some advanced use cases that we may encounter with OCR.

CASE 1:

Take different italic style input, now this is a little more challenging and the output for this is unfortunately not recognized with any of the tesseract options, however for a few PSM options 80-90% result is accurate and depicted in Figure 9. For “i” it is t and “t” is ‘k’. Hence for now in tesseract their no option to recognize this scenario, so you can either try to re-train the tesseract model for this kind of input or you can use a commercial OCR engine.

Figure 1: Italic style

$ tesseract italic.png stdout --psm 8

Figure 2: Figure 1 output

CASE 2: Consider Figure 3, which is a receipt from the grocery store. Let’s try to OCR this image using the default (--psm 3) mode:

Figure 3: Whole Foods Market receipt we will OCR.

‍$ tesseract receipt.png stdout --psm 3

Figure 4: On Figure 3 with PSM 3

$ tesseract receipt.png stdout --psm 4

Figure 5: On Figure 3 with PSM 4

That did not go so well. Tesseract cannot imply that we are going to look at column data and that text within the same row must be associated together when we use the default —psm 3 mode.
To address this issue, we can use the —psm 4 mode. As you can see, the results are far superior. Tesseract understands that text should be clustered row-by-row, enabling us to OCR the receipt's items.
As you'll see, the outcomes are much better here. Tesseract acknowledges that text should be grouped row-by-row, enabling us to OCR the receipt's items.

[Figures 9 and 10]. PSM 12 mode is essentially identical to PSM 11, but it now includes OSD.

CASE 3: Now we will try for interesting and challenging input “automatic license/number plate recognition (ANPR) system”

Figure 6. Unfortunately, PSM 3 doesn't work for this input, whereas if we provide PSM 7 which handles the Image as a Single Text Line, gives the correct result, and even if tested with PSM 8 that also gives the same. However the difference between PSM 7 and 8 is a single line or a single word, so based on your input type you can select either of them.

‍

Figure 6: A license plate we will OCR

$ tesseract numberplate1.png stdout --psm 3

$ tesseract numberplate1.png stdout --psm 7

Figure 7: Result on Figure 6.

CASE 4: Text presented in the form of rows and columns i.e sparse text, depicted in Figure 15 for this kind of input again we can go with the first PSM 3 default option whereas PSM 11 is best suited for this as it is specially designed for sparse text recognition.[Exprimention you can refer to Figures

Figure 8: Sparse text

$ tesseract sadhgurubook_chapter.png stdout --psm 3

Figure 9: Figure 8 OCR using PSM 3

$ tesseract sadhgurubook_chapter.png stdout --psm 11

Figure 10: Figure 8 OCR using PSM 11

Now let's try some big hurdles

“CASE 5: Handwritten text” and “CASE 10: Image in table form”. Figures 11 and 12 respectively. For case 5, our experimentation shows tesseract has the option PSM 9, which works well, however, a little harder handwriting does not work even with PSM 9. That's why full handwritten OCR is still a research topic.

‍

Figure 11: Handwritten text image

$ tesseract handwriten.png stdout --psm 3

$ tesseract handwriten.png stdout --psm 9

Figure 12: Result of Figure 11

Moving towards the table image, of Figure 13: Top 10 cricket highest score teams in ODI presented table image format. If the table is present we expect the output is also in table format only but unfortunately with option PSM 3 and even with 11 we are not getting the same output result, output is depicted in Figure 14. In order to handle inputs of CASE 9 and 10, some image pre-processing will be necessary. To address this, I will be writing an additional blog post in the near future.

Figure 13: Top 10 cricket highest score teams in ODI in table image format

$ tesseract tabel.png stdout --psm 11 or 3

Figure 14: Result of Figure 13

Summary

There are lots of option are available in the tesseract PSM option. Each one of Tesseract's fourteen PSMs assumes certain information regarding your source images, such as a block of content for eg, a scanned book, a single sentence of text for eg, a single statement from an article, or perhaps a single word for eg, a driving license plate. Our skill is to select the correct option for desired output. Here I have presented various cases for the right choice of PSM. most of the time OCR is used in traffic monitoring video surveillance applications for number plate recognition and we want to go for an Open-source engine such as tesseract or Easyocr, currently, the tesseract is the best preference with PSM 7 or 8. In the billing receipt digitization process, if we need an invoice in excel for word format for further accounting, we can go tesseract PSM option 4, however, a few Non-ASCII characters present in an invoice are missing, you can ignore them by applying a filter in your script. Likewise before applying any PSM option just refer –psm help and start with the default preference PSM 3 and then rest as per PSM descriptions. The more experience you gain with PSMs, the easier it will be to apply OCR to your own tasks.

‍

Hi, my name is Kiran Kamble. When I am done analyzing data, I play badminton and cricket and weekends are meant for hiking.

Want to receive update about our upcoming podcast?

Latest Articles

View All Articles

Implementing custom windowing and triggering mechanisms in Apache Flink for advanced event aggregation

Dive into advanced Apache Flink stream processing with this comprehensive guide to custom windowing and triggering mechanisms. Learn how to implement volume-based windows, pattern-based triggers, and dynamic session windows that adapt to user behavior. The article provides practical Java code examples, performance optimization tips, and real-world implementation strategies for complex event processing scenarios beyond Flink's built-in capabilities.

15

min read

Implementing feature flags for controlled rollouts and experimentation in production

Discover how feature flags can revolutionize your software deployment strategy in this comprehensive guide. Learn to implement everything from basic toggles to sophisticated experimentation platforms with practical code examples in Java, JavaScript, and Node.js. The post covers essential implementation patterns, best practices for flag management, and real-world architectures that have helped companies like Spotify reduce deployment risks by 80%. Whether you're looking to enable controlled rollouts, A/B testing, or zero-downtime migrations, this guide provides the technical foundation you need to build robust feature flagging systems.

12

min read

Implementing incremental data processing using Databricks Delta Lake's change data feed

Discover how to implement efficient incremental data processing with Databricks Delta Lake's Change Data Feed. This comprehensive guide walks through enabling CDF, reading change data, and building robust processing pipelines that only handle modified data. Learn advanced patterns for schema evolution, large data volumes, and exactly-once processing, plus real-world applications including real-time analytics dashboards and data quality monitoring. Perfect for data engineers looking to optimize resource usage and processing time.

12

min read