You are not logged in.
Hi Team,
Looking for a help for the below scenario...
I am currently using Tesseract OCR of version: 3.0.2.0 and along with that ImageMagick for image processing.
I am developing a process in which i have scanned pdf i.e a Image file from which I need to get some specific fields of data for doing that my code set up is all ready but some time i am facing challenges the text which i need to read is very clear by seeing with human eyes but not able to lift through OCR.
So the whole is i am not able to lift all the words from the images as required and some time the words letters are not same as they are in image.
The section which i am talking is most the time i am not able to lift the word
1) Excise
2) Tax Due
below is the method i am using for reading all the data and keeping in to a list......
using (var ocr = Engine)
{
using (var page = ocr.Process(img[0], PageSegMode.Auto))
{
using (var iterator = page.GetIterator())
{
iterator.Begin();
do
{
currentWord = iterator.GetText(PageIteratorLevel.Word);
iterator.TryGetBoundingBox(PageIteratorLevel.Word, out Rect bounds);
if (currentWord.Trim() != "")
{
OCRlist.Add(new KeyValuePair<string, Rect>(currentWord, bounds));
}
} while (iterator.Next(PageIteratorLevel.Word));
Hope provided information will be fair enough to get some solution and if still any deep information required please let me know i will be able to provide at my best.
And I am in a state where facing challenge with OCR reading i have all the words available i will be at good position to move forward.
Thanks,
Shobhit
Offline
this is the forums for Tesseract the graphics engine which is different from Tesseract-OCR,
you can submit issues for OCR here https://github.com/tesseract-ocr/tesseract/issues
Offline