image
image

|


Go Back   macosx.com > Mac Help Forums > Unix & X11

Reply
 
Thread Tools
  #1  
Old September 22nd, 2002, 02:29 PM
Gwailo's Avatar
B.A. Economics (Hon)
 
Join Date: Mar 2002
Location: Ottawa, Ontario
Posts: 330
Thanks: 0
Thanked 0 Times in 0 Posts
Gwailo is on a distinguished road
Stripping HTML tags from text files?

I was wondering if there was any simple utility, either graphical or Darwin, to strip HTML tags from an HTML (i.e., plaintext) file?

A perl utility would be best, since I do a lot of work remotely on the terminal.

TIA
__________________
//Gwailo//

iMac TFT 700MHz G4, 786 RAM, 40GB Internal
DVD-ROM/CD-RW 12x8x32
USB 64MB Flash Drive
Wacom Graphire2 Tablet
Epson 777i Colour Printer
Canon PowerShot S30 Digital Camera
JVC GR-DVF21 NTSC MiniDV Camera
Canon EOS Elan II (35mm)

"Like a beautiful flower full of colour and also fragrant, even so, fruitful are the fair words of one who practices them."
--54th Surtra, The Dhammapada

Reply With Quote
  #2  
Old September 22nd, 2002, 09:51 PM
Registered User
 
Join Date: Apr 2001
Location: Virginia, U.S.
Posts: 40
Thanks: 0
Thanked 0 Times in 0 Posts
fddi1 is on a distinguished road
I'm sure there is a HTML parser module for perl. Try CPAN. You should be able to find one if not some.
Reply With Quote
  #3  
Old September 22nd, 2002, 11:08 PM
vertigo's Avatar
Swollen Member
 
Join Date: Oct 2000
Location: Baltimore, MD
Posts: 38
Thanks: 0
Thanked 0 Times in 0 Posts
vertigo is on a distinguished road
if all you want to do is strip the tags, you could do something like

$html =~ s/<.+>//g;
Reply With Quote
  #4  
Old September 23rd, 2002, 08:54 AM
Gwailo's Avatar
B.A. Economics (Hon)
 
Join Date: Mar 2002
Location: Ottawa, Ontario
Posts: 330
Thanks: 0
Thanked 0 Times in 0 Posts
Gwailo is on a distinguished road
Quote:
Originally posted by vertigo
if all you want to do is strip the tags, you could do something like

$html =~ s/<.+>//g;
That's very kind of you but I don't know how to implement this into a script, I know 0 perl...
__________________
//Gwailo//

iMac TFT 700MHz G4, 786 RAM, 40GB Internal
DVD-ROM/CD-RW 12x8x32
USB 64MB Flash Drive
Wacom Graphire2 Tablet
Epson 777i Colour Printer
Canon PowerShot S30 Digital Camera
JVC GR-DVF21 NTSC MiniDV Camera
Canon EOS Elan II (35mm)

"Like a beautiful flower full of colour and also fragrant, even so, fruitful are the fair words of one who practices them."
--54th Surtra, The Dhammapada

Reply With Quote
  #5  
Old September 23rd, 2002, 10:46 AM
hazmat's Avatar
Rusher of Din
 
Join Date: Oct 2001
Location: Brooklyn, NY
Posts: 1,803
Thanks: 0
Thanked 0 Times in 0 Posts
hazmat is on a distinguished road
There's a utility call html2text on a NetBSD system I have an account on. I'm sure you could find it for Darwin.
Reply With Quote
  #6  
Old September 23rd, 2002, 02:53 PM
Gwailo's Avatar
B.A. Economics (Hon)
 
Join Date: Mar 2002
Location: Ottawa, Ontario
Posts: 330
Thanks: 0
Thanked 0 Times in 0 Posts
Gwailo is on a distinguished road
Lightbulb I had an idea...

While I'm still going to look for a darwin alternative, I figured out that I could probably just open my web page in IE and select Save as Plain Text That'll have to suffice for now, but thanks for all the hints guys!
__________________
//Gwailo//

iMac TFT 700MHz G4, 786 RAM, 40GB Internal
DVD-ROM/CD-RW 12x8x32
USB 64MB Flash Drive
Wacom Graphire2 Tablet
Epson 777i Colour Printer
Canon PowerShot S30 Digital Camera
JVC GR-DVF21 NTSC MiniDV Camera
Canon EOS Elan II (35mm)

"Like a beautiful flower full of colour and also fragrant, even so, fruitful are the fair words of one who practices them."
--54th Surtra, The Dhammapada

Reply With Quote
  #7  
Old September 23rd, 2002, 02:56 PM
hazmat's Avatar
Rusher of Din
 
Join Date: Oct 2001
Location: Brooklyn, NY
Posts: 1,803
Thanks: 0
Thanked 0 Times in 0 Posts
hazmat is on a distinguished road
Here: http://www.google.com/search?hl=en&i...-8&q=html2text . Looks like plenty of options.
Reply With Quote
  #8  
Old September 23rd, 2002, 04:05 PM
Gwailo's Avatar
B.A. Economics (Hon)
 
Join Date: Mar 2002
Location: Ottawa, Ontario
Posts: 330
Thanks: 0
Thanked 0 Times in 0 Posts
Gwailo is on a distinguished road
Perfect thanks Hazmat
__________________
//Gwailo//

iMac TFT 700MHz G4, 786 RAM, 40GB Internal
DVD-ROM/CD-RW 12x8x32
USB 64MB Flash Drive
Wacom Graphire2 Tablet
Epson 777i Colour Printer
Canon PowerShot S30 Digital Camera
JVC GR-DVF21 NTSC MiniDV Camera
Canon EOS Elan II (35mm)

"Like a beautiful flower full of colour and also fragrant, even so, fruitful are the fair words of one who practices them."
--54th Surtra, The Dhammapada

Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Trackbacks are Off
Pingbacks are Off
Refbacks are Off
Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
iTunes adds ID3 tags to AIF files? phatcactus Mac OS X System & Mac Software 3 March 17th, 2003 11:45 PM
Stripping html code out of a source file Darkshadow Software Programming & Web Scripting 8 March 9th, 2003 10:58 PM
Converting text files from windows to unix (OSX) and back paulsomm Mac OS X System & Mac Software 1 December 4th, 2001 03:23 PM
Text Edit and HTML cutman1000 Apple News, Rumors & Discussion 1 October 6th, 2001 06:46 PM
Editing plain old text files Allan Crowson Mac OS X System & Mac Software 6 December 4th, 2000 11:19 AM


All times are GMT -5. The time now is 05:59 PM.


Mac Support® Version 3.7.2
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Search Engine Friendly URLs by vBSEO 3.1.0
Copyright 2000-2008 DigitalCrowd, Inc.