Copying from pdf

Lazzo

Registered
Greetings troops:

It's not overly important and I've finished the rotten job now, but just thought I'd ask - Has anyone else lost the letters 'fi' from the beginning of words when copying and pasting direct from a pdf? Always at the beginning of a paragraph, which I 'spose is fortunate in itself.

Or perhaps it IS important. If that'd slipped past me to the client, well...

Anybody know why?

(There was no point in saving it as an rtf as it's full of mathematical symbols that had to be typed in manually.)
 
Did you get bit by a ligature? Where the fi is actually replaced by a new letter which is both of them kind of run together, the dot of the i is missing or merged with the top of the f. Other possible problems would be ff and ffi as the fs are joined as well.

I am not sure what the source of your PDF was but there are are all sorts of potential problems depending on how these things are implemented in the specific PDF document you are using.

As an aside kerning can also cause problems when it is generated by a high quality typesetter for text searches, cut and paste, and the like. Once you go beyond the simple word at a time formatting of Word and look at the output of something like TeX where each letter is individually typeset the hacks for pulling text out of PDF files really falls apart.

-Eric
 
No, there weren't any ligatures. It happened with ff, though. Some words like 'difficult' and such were missing the ff, but not all of them!

The source of the pdf is unknown, looks like it was set in Baskerville or similar, on a PC no doubt.

Kerning - thanks, that hadn't occurred to me, good thinking! That brings up a few similar past problems. Even a font's built-in character kerning could naff it up.

I wish clients would just send .txt files instead of trying to be clever, they'd save a lot of money!
 
Back
Top