Stripping HTML tags from text files?

Gwailo

B.A. Economics (Hon)
I was wondering if there was any simple utility, either graphical or Darwin, to strip HTML tags from an HTML (i.e., plaintext) file?

A perl utility would be best, since I do a lot of work remotely on the terminal.

TIA :p
 
I'm sure there is a HTML parser module for perl. Try CPAN. You should be able to find one if not some.
 
Originally posted by vertigo
if all you want to do is strip the tags, you could do something like

$html =~ s/<.+>//g;

That's very kind of you but I don't know how to implement this into a script, I know 0 perl... :rolleyes:
 
There's a utility call html2text on a NetBSD system I have an account on. I'm sure you could find it for Darwin.
 
While I'm still going to look for a darwin alternative, I figured out that I could probably just open my web page in IE and select Save as Plain Text :) That'll have to suffice for now, but thanks for all the hints guys! :cool:
 
Back
Top