Stripping HTML tags from text files?

Gwailo

B.A. Economics (Hon)
I was wondering if there was any simple utility, either graphical or Darwin, to strip HTML tags from an HTML (i.e., plaintext) file?

A perl utility would be best, since I do a lot of work remotely on the terminal.

TIA :p
 

fddi1

Registered
I'm sure there is a HTML parser module for perl. Try CPAN. You should be able to find one if not some.
 

vertigo

Swollen Member
if all you want to do is strip the tags, you could do something like

$html =~ s/<.+>//g;
 

Gwailo

B.A. Economics (Hon)
Originally posted by vertigo
if all you want to do is strip the tags, you could do something like

$html =~ s/<.+>//g;
That's very kind of you but I don't know how to implement this into a script, I know 0 perl... :rolleyes:
 

hazmat

Rusher of Din
There's a utility call html2text on a NetBSD system I have an account on. I'm sure you could find it for Darwin.
 

Gwailo

B.A. Economics (Hon)
While I'm still going to look for a darwin alternative, I figured out that I could probably just open my web page in IE and select Save as Plain Text :) That'll have to suffice for now, but thanks for all the hints guys! :cool:
 
Top