Not able to lift all the data from the image

ShobhitKapil · 2019-01-22 11:51:34

Hi Team,

Looking for a help for the below scenario...

I am currently using Tesseract OCR of version: 3.0.2.0 and along with that ImageMagick for image processing.

I am developing a process in which i have scanned pdf i.e a Image file from which I need to get some specific fields of data for doing that my code set up is all ready but some time i am facing challenges the text which i need to read is very clear by seeing with human eyes but not able to lift through OCR.
So the whole is i am not able to lift all the words from the images as required and some time the words letters are not same as they are in image.

The section which i am talking is most the time i am not able to lift the word
1) Excise
2) Tax Due

below is the method i am using for reading all the data and keeping in to a list......

using (var ocr = Engine)
{
using (var page = ocr.Process(img[0], PageSegMode.Auto))
{
using (var iterator = page.GetIterator())
{
iterator.Begin();
do
{
currentWord = iterator.GetText(PageIteratorLevel.Word);
iterator.TryGetBoundingBox(PageIteratorLevel.Word, out Rect bounds);

if (currentWord.Trim() != "")
{
OCRlist.Add(new KeyValuePair<string, Rect>(currentWord, bounds));
}
} while (iterator.Next(PageIteratorLevel.Word));

Hope provided information will be fair enough to get some solution and if still any deep information required please let me know i will be able to provide at my best.

And I am in a state where facing challenge with OCR reading i have all the words available i will be at good position to move forward.

Thanks,
Shobhit

ThaOneDon · 2019-01-25 13:05:15

this is the forums for Tesseract the graphics engine which is different from Tesseract-OCR,
you can submit issues for OCR here https://github.com/tesseract-ocr/tesseract/issues

Tesseract - Forum

#1 2019-01-22 11:51:34

Not able to lift all the data from the image

#2 2019-01-25 13:05:15

Re: Not able to lift all the data from the image

Board footer