# Stripping HTML tags from text files?



## Gwailo (Sep 22, 2002)

I was wondering if there was any simple utility, either graphical or Darwin, to strip HTML tags from an HTML (i.e., plaintext) file?

A perl utility would be best, since I do a lot of work remotely on the terminal.

TIA


----------



## fddi1 (Sep 22, 2002)

I'm sure there is a HTML parser module for perl.  Try CPAN.  You should be able to find one if not some.


----------



## vertigo (Sep 22, 2002)

if all you want to do is strip the tags, you could do something like

$html =~ s/<.+>//g;


----------



## Gwailo (Sep 23, 2002)

> _Originally posted by vertigo _
> *if all you want to do is strip the tags, you could do something like
> 
> $html =~ s/<.+>//g; *



That's very kind of you but I don't know how to implement this into a script, I know 0 perl...


----------



## hazmat (Sep 23, 2002)

There's a utility call html2text on a NetBSD system I have an account on.  I'm sure you could find it for Darwin.


----------



## Gwailo (Sep 23, 2002)

While I'm still going to look for a darwin alternative, I figured out that I could probably just open my web page in IE and select Save as Plain Text  That'll have to suffice for now, but thanks for all the hints guys!


----------



## hazmat (Sep 23, 2002)

Here: http://www.google.com/search?hl=en&ie=UTF-8&oe=UTF-8&q=html2text .  Looks like plenty of options.


----------



## Gwailo (Sep 23, 2002)

Perfect thanks Hazmat


----------

