Stripping HTML tags from text files?

Gwailo · Sep 22, 2002

I was wondering if there was any simple utility, either graphical or Darwin, to strip HTML tags from an HTML (i.e., plaintext) file?

A perl utility would be best, since I do a lot of work remotely on the terminal.

TIA

fddi1 · Sep 22, 2002

I'm sure there is a HTML parser module for perl. Try CPAN. You should be able to find one if not some.

vertigo · Sep 22, 2002

if all you want to do is strip the tags, you could do something like

$html =~ s/<.+>//g;

Gwailo · Sep 23, 2002

Originally posted by vertigo
if all you want to do is strip the tags, you could do something like

$html =~ s/<.+>//g;

That's very kind of you but I don't know how to implement this into a script, I know 0 perl...

hazmat · Sep 23, 2002

There's a utility call html2text on a NetBSD system I have an account on. I'm sure you could find it for Darwin.

Gwailo · Sep 23, 2002

While I'm still going to look for a darwin alternative, I figured out that I could probably just open my web page in IE and select Save as Plain Text

That'll have to suffice for now, but thanks for all the hints guys!

hazmat · Sep 23, 2002

Here: http://www.google.com/search?hl=en&ie=UTF-8&oe=UTF-8&q=html2text . Looks like plenty of options.

Gwailo · Sep 23, 2002

Perfect thanks Hazmat

Stripping HTML tags from text files?

Gwailo

B.A. Economics (Hon)

fddi1

Registered

vertigo

Swollen Member

Gwailo

B.A. Economics (Hon)

hazmat

Rusher of Din

Gwailo

B.A. Economics (Hon)

hazmat

Rusher of Din

Gwailo

B.A. Economics (Hon)