Tesseract does not recognize monospaced font

Elmue · 2017-09-25 21:22:33

Hello

I have the word "CONFIGURATION" in an image.
But what Tesseract recognizes is "CONFI GURATION"

Always when there is the letter "I" or "1" in a word, Tesseract recognizes this as two words.

This happens with Tesseract 3.03 and with 3.05.
I use mode -psm 6

I have trained the traineddata on my own.
I told Tesseract that the font is monospaced, but it does not work.

My font_properties contains:
FontName@M 0 0 1 0 0

The Tesseract documentation says that Tesseract recognizes monospaced fonts.
But it does NOT.
The spaces around the "I" are wider than the spaces between the other characters.
And Tesseract misinterprets this as the separation between two words.

Can anybody please direct into the right direction where to search ?

Is there any configuration while recognition that I must change ?
Or is there anything when building the traineddata that I must change ?

What is the name of this problem to search for further discussion about that topic ?

And why does this forum not allow to upload images ?

chasester1 · 2017-09-26 05:27:12

not the correct forums

you are looking for tesseract picture to text. This is for a different thing. You may want to add more adjectives to your google search.

chasester

Ps images can be uploaded else where and used here with tag script.

Pss tesseract ocr is the name of the project you seek, but i dont think they have forums. But they do have a git page so maybe there is more info there.

Last edited by chasester1 (2017-09-26 05:31:12)

Tesseract - Forum

#1 2017-09-25 21:22:33

Tesseract does not recognize monospaced font

#2 2017-09-26 05:27:12

Re: Tesseract does not recognize monospaced font

Board footer