Automator Help

macnitwork

Registered
Tried to use Automator to extract the text out of some pdfs but it only worked on some of them. The others just open as blank text in RTF. Any suggestions on how to pull the text out? Thanks.
 
Is the text in those PDFs actual, true text, or are they images of text?

If you can use the text highlighting tool in Acrobat or Preview to actually select text in the PDF, it should work... but if the text is actually an image (like a scanned document or something), then even though it looks like text, it's not -- to the computer, that "image text" is no different than trying to extract the text from an image of a sunset.
 
That makes sense thanks. Can I use automator to pull the image off the pdf and put it in a word doc to highlight it?
 
While you can see and read the text there, the computer doesn't know the text from an image. You will need to perform some "OCR" (Optical Character Recognition) procedure on the text-image in order for the computer to convert the image into usable, highlightable text.

Acrobat Professional can do this -- do you have Acrobat Professional?

You can also try using Google to OCR your image-based text documents:

http://unclutterer.com/2008/11/08/google-can-now-ocr-all-pdfs/

While none of these solutions will be able to be integrated with Automator, you may be able to OCR the image-based text documents, THEN run the Automator script on the OCR'ed documents.
 
Back
Top