#1 2019-01-22 11:51:34


Not able to lift all the data from the image

Hi Team,

Looking for a help for the below scenario...

I am currently using Tesseract OCR of version: and along with that ImageMagick for image processing.

I am developing  a process in which i have scanned pdf i.e a Image file from which I need to get some specific fields of data for doing that my code set up is all ready but some time i am facing challenges the text which i need to read is very clear by seeing with human eyes but not able to lift through OCR.
So the whole is i am not able to lift  all the words from the images as required and some time the words letters are not same as they are in image.

The section which i am talking is most the time i am not able to lift the word
1) Excise
2) Tax Due

below is the method i am using for reading all the data and keeping in to a list......

using (var ocr = Engine)
                using (var page = ocr.Process(img[0], PageSegMode.Auto))
                    using (var iterator = page.GetIterator())
                            currentWord = iterator.GetText(PageIteratorLevel.Word);
                            iterator.TryGetBoundingBox(PageIteratorLevel.Word, out Rect bounds);

                            if (currentWord.Trim() != "")
                                OCRlist.Add(new KeyValuePair<string, Rect>(currentWord, bounds));
                        } while (iterator.Next(PageIteratorLevel.Word));

Hope provided information will be fair enough to get some solution and if still any deep information required please let me know i will be able to provide at my best.

And I am in a state where facing challenge with OCR reading i have all the words available i will be at good position to move forward.



#2 2019-01-25 13:05:15


Re: Not able to lift all the data from the image

this is the forums for Tesseract the graphics engine which is different from Tesseract-OCR,
you can submit issues for OCR here https://github.com/tesseract-ocr/tesseract/issues


Board footer