image
image

Go Back   macosx.com > Design, Media, Programming & Scripting > Software Programming & Web Scripting

Reply
 
LinkBack Thread Tools
  #1  
Old March 2nd, 2003, 11:00 AM
Darkshadow's Avatar
wandering shadow
 
Join Date: Jul 2001
Location: DE, USA
Posts: 1,532
Thanks: 0
Thanked 0 Times in 0 Posts
Darkshadow is on a distinguished road
Stripping html code out of a source file

Is there an easy way to do this? I'm writing a small program for myself, and one of the steps requires me to strip the html code out of a source page.

I do need, however, to keep the alt part from the image tags - just the text that would be displayed if the images weren't displayed.
__________________
I am but a lonely shadow,
Doomed forever to roam and wander.
But if you allow me to pause before I must go,
I'll spin you tales of mystery and wonder.


Site: Night Productions
Reply With Quote
  #2  
Old March 4th, 2003, 09:12 PM
symphonix's Avatar
Scratch & Sniff Committee
 
Join Date: Jul 2001
Location: The Australian Jungles
Posts: 4,025
Thanks: 2
Thanked 5 Times in 4 Posts
symphonix is on a distinguished road
Just to clear this up a bit, what language are you using to write your small program, or haven't you decided yet?

You should be able to remove anything enclosed in < and > symbols, which will leave you with only the text. You will need to find a way to parse the ALT=" to " text into the final output, too.

How you do this depends on the language you are using.
__________________
- iMac G5 1.8GHZ 17" | SuperDrive | 160GB | 512MB | Airport Extreme | Bluetooth Keyboard & Mouse | Wacom Intuos II
- Pentax *ist DL - JVC MiniDV Camcorder - Airport Express - iPod Nano 1gb white
Reply With Quote
  #3  
Old March 4th, 2003, 09:59 PM
Darkshadow's Avatar
wandering shadow
 
Join Date: Jul 2001
Location: DE, USA
Posts: 1,532
Thanks: 0
Thanked 0 Times in 0 Posts
Darkshadow is on a distinguished road
Any language - whichever is the easiest.

Mostly I use shell scripts, but that only gives me sed - which is a great stripping program, but really bites at stripping html.

I know some PHP and PERL, so if you give me the relevent commands, I'm sure I can adapt them.
__________________
I am but a lonely shadow,
Doomed forever to roam and wander.
But if you allow me to pause before I must go,
I'll spin you tales of mystery and wonder.


Site: Night Productions
Reply With Quote
  #4  
Old March 5th, 2003, 02:20 AM
symphonix's Avatar
Scratch &amp; Sniff Committee
 
Join Date: Jul 2001
Location: The Australian Jungles
Posts: 4,025
Thanks: 2
Thanked 5 Times in 4 Posts
symphonix is on a distinguished road
Perl would be the easiest for this sort of task; text processing is what it does best.
__________________
- iMac G5 1.8GHZ 17" | SuperDrive | 160GB | 512MB | Airport Extreme | Bluetooth Keyboard & Mouse | Wacom Intuos II
- Pentax *ist DL - JVC MiniDV Camcorder - Airport Express - iPod Nano 1gb white
Reply With Quote
  #5  
Old March 8th, 2003, 12:43 PM
TommyWillB's Avatar
Registered User
 
Join Date: Mar 2001
Location: ol' Gay San Francisco
Posts: 2,020
Thanks: 0
Thanked 0 Times in 0 Posts
TommyWillB is on a distinguished road
There are "text only" Web browsers (Lynx?) oput there.

It seems to me that you can simply use one of them to get the content including the ALT tags.

Also, can't you simply do a IE Save As (Format = Plain Text) to ge this?
__________________
TommyWillB
Intel iMac "early 2006" core duo
TommyWillB.com hosted on Mac OS X 10.5.x / Apache 2.2.x / PHP 5.x
Reply With Quote
  #6  
Old March 8th, 2003, 04:15 PM
Darkshadow's Avatar
wandering shadow
 
Join Date: Jul 2001
Location: DE, USA
Posts: 1,532
Thanks: 0
Thanked 0 Times in 0 Posts
Darkshadow is on a distinguished road
Err...probably. I haven't used IE in a while. But no, that's not really an option - I'm doing this as part of a program, so I wouldn't be able to do it that way.
__________________
I am but a lonely shadow,
Doomed forever to roam and wander.
But if you allow me to pause before I must go,
I'll spin you tales of mystery and wonder.


Site: Night Productions
Reply With Quote
  #7  
Old March 8th, 2003, 10:26 PM
TommyWillB's Avatar
Registered User
 
Join Date: Mar 2001
Location: ol' Gay San Francisco
Posts: 2,020
Thanks: 0
Thanked 0 Times in 0 Posts
TommyWillB is on a distinguished road
BBedit has both an AppleScript and command line interfaces... It also has a command to "Remove Markup"... I'm not sure if you can script that though...
__________________
TommyWillB
Intel iMac "early 2006" core duo
TommyWillB.com hosted on Mac OS X 10.5.x / Apache 2.2.x / PHP 5.x
Reply With Quote
  #8  
Old March 8th, 2003, 11:27 PM
dafuser's Avatar
Land of Confusion
 
Join Date: Nov 2002
Location: Texas
Posts: 62
Thanks: 0
Thanked 0 Times in 0 Posts
dafuser is on a distinguished road
Re: Stripping html code out of a source file

Quote:
Originally posted by Darkshadow
Is there an easy way to do this? I'm writing a small program for myself, and one of the steps requires me to strip the html code out of a source page.

I do need, however, to keep the alt part from the image tags - just the text that would be displayed if the images weren't displayed.
Take a look at this program. It's a MAC program which will convert your HTML files into text files.

http://www.printerport.com/klephacks/markdown.html
__________________
--
Dafuser

"I picked up a Magic 8-Ball the other day and it said 'Outlook not so good'. I said 'Sure, but Microsoft still ships it.'"
Reply With Quote
Reply

Bookmarks

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are Off
Pingbacks are Off
Refbacks are Off

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Can Safari export bookmarks as html file? d54321k Mac OS X System & Mac Software 2 August 25th, 2003 10:29 AM
I installed Fink under root and..... Hydroglow Unix & X11 5 November 27th, 2002 04:57 PM
Stripping HTML tags from text files? Gwailo Unix & X11 7 September 23rd, 2002 05:05 PM
How to compile stuff fintler Unix & X11 1 August 28th, 2002 08:57 AM
vignette client on mac os x erim Software Programming & Web Scripting 8 July 13th, 2001 02:14 PM


All times are GMT -5. The time now is 09:45 PM.


Powered by vBulletin® Version 3.8.4
Copyright ©2000 - 2009, Jelsoft Enterprises Ltd.
Search Engine Optimization by vBSEO 3.3.0 RC1
Copyright 2000-2010 DigitalCrowd, Inc.