OCR in X11?

pm149

Registered
Any tips for OCR in X11? I have been trying to get things going with kooka. I can't scan with the program, so I scan with Xsane, then save as Windows BMP and import that file before OCR'ing the image.

If I try to select gocr as my OCR preference, things start up, but eventually crash before producing a result. I *can* select ocrad as my preference, which gives me a result, but with so many errors I might as well re-type the whole document! I have a trial version of ABBY Fine reader for Mac, which runs under Classic, and it seems to do a far better job than ocrad so far. I've only got a dozen trial uses left. Is there any way of improving things and staying open source?

10.3.8 on 1.25 eMac with 1Gb RAM, running XDarwin
 
I'm surprised ocrad doesn't work well. I've heard good things about it. Are you working with printed text, or with handwritten text?
 
Viro said:
I'm surprised ocrad doesn't work well. I've heard good things about it. Are you working with printed text, or with handwritten text?


i've used a few different typed text samples. It crashes regardless.
 
You can just invoke ocrad from the command line. I've just tested it myself after installing it via darwin ports. The results are very very poor (!!). gocr seems to give much better results, and that isn't too great either.

Perhaps someone else may know of a free OCR solution aside from these two programs.
 
Viro said:
You can just invoke ocrad from the command line. I've just tested it myself after installing it via darwin ports. The results are very very poor (!!). gocr seems to give much better results, and that isn't too great either.

Perhaps someone else may know of a free OCR solution aside from these two programs.


Thanks. What instructions do you use to invoke gocr from the command line?
 
If you have your image in a pbm format, just "gocr image.pbm" and it will output all the recognized text in the console. You can look at the man pages to see if there are any options that will help in recognition, but from my limited experiments, the defaults seem to work well. If you want the output to go into a text file, just to "gocr image.pbm > text.txt"
 
Viro said:
If you have your image in a pbm format, just "gocr image.pbm" and it will output all the recognized text in the console. You can look at the man pages to see if there are any options that will help in recognition, but from my limited experiments, the defaults seem to work well. If you want the output to go into a text file, just to "gocr image.pbm > text.txt"

Thanks Viro. I'm still trying to get this to work. In the meantime, I've managed to also compile clara. Has anyone else had experience with this program? I've started by training it with a simple document. It looks promising, but do you have train it with each new doc't?
 
Back
Top