man command > filename.txt question

msents

Registered
Is there a way to pipe text from man to a text file where text in bold doesn't appear as double letters when read with textedit.app

Example:

[Cube:~] msents% man ls > ls.txt
[Cube:~] msents% open ls.txt
[Cube:~] msents%

<TextEdit.app opens ls.txt>

LS(1) System General Commands Manual LS(1)

NNAAMMEE
llss - list directory contents

SSYYNNOOPPSSIISS
..snip...

But check this out: when I cat it on console it looks fine:

[Cube:~] msents% cat ls.txt
LS(1) System General Commands Manual LS(1)

NAME
ls - list directory contents

SYNOPSIS
...snip...

Any comments?
 
I believe the manpage output is formatted with nroff or troff. I don't know which. You might try looking in the manpages for man to find out the exact thing that's called (and with what options) to pipe it through that first.
 
Looks like troff makes a postscript file:

[Cube:~] msents% man -t man > man.txt
[Cube:~] msents% open man.txt

Contents of man.txt as displayed in textedit:
%!PS-Adobe-3.0
%%Creator: groff version 1.17.2
%%CreationDate: Mon Mar 24 00:55:47 2003
%%DocumentNeededResources: font Times-Roman
...snip...

Maybe I can use groff but I think an easier solution is to use grep.
 
man -t outputs valid postscript. To test this:
man -t bash > bash.ps
gv ps

You could simply do ps2pdf on that and print the pdf.

ps2pdf bash.ps bash.pdf
 
It makes complete sense to me. The double characters are for linefeed printing and are sent twice to print twice. It's a lot easier than the regular expression I built in perl...
 
I can't remember if perl is installed on MacOS X by default or not. This works for me, you need to know the path to the man page you're interested in:

zcat /usr/man/man1/getty.1.gz | /usr/bin/groff -Tascii8 -mandoc | perl -e 'while (<>) { $_ =~ s/.\010//g; print $_; }' > ! txt

The groff formats the man page in ascii for the screen. The perl deletes every sequence that matches "character followed by a backspace"

The results will be stored in the file called "txt"
 
If zcat isn't installed on MacOS X by default then replace it with:

gunzip -c

Sorry, my MacOS X box is powered down and I'm at work.
 
Um...no it doesn't. The text file it output was exactly what I would have seen running the man command. Same exact tab stops, same exact line breaks - everything.

Here, I'll give a good example. I made a cgi for my webserver that displays man pages. All it does is run the man command, pipe it through col -b, and then uses pre tags in the html. What you see is exactly what comes out of col -b. If ya don't believe me, look at the source, you'll see there's no html formatting of the output, it's displayed exactly as col -b outputs it.

http://dreamstatic.dyndns.org/cgi-bin/man.cgi?man=man
 
Darkshadow, substrate, wyvern: You guys are great.

The regex that I built in perl was heavier than yours at first, substrate- but my missing piece was the invisible backspace character. As soon as I saw your post I was punching myself in the head for this one (not noticing the backspace). Here's what I did in the end:

man ls | perl -p -i -e 's/.\010//g' > ls.txt
but man ls | perl -p -e -n 's/.\010//g' > ls.txt will do the same thing.

Of course, Darkshadow's

man ls | col -b > ls.txt

is my favorite, because it is most elegant. Small Wonder! <--(Darkshadow will understand)

Lessons learned:
Look for invisible characters or get a punch in the head.
Scripts are less elegant system commands.
'perl -p -e -n' or 'perl -p -i -e' are smaller than using 'while (<>)...'
 
Back
Top