|
#1
| ||||
| ||||
| Stripping html code out of a source file
Is there an easy way to do this? I'm writing a small program for myself, and one of the steps requires me to strip the html code out of a source page. I do need, however, to keep the alt part from the image tags - just the text that would be displayed if the images weren't displayed.
__________________ I am but a lonely shadow, Doomed forever to roam and wander. But if you allow me to pause before I must go, I'll spin you tales of mystery and wonder. Site: Night Productions |
|
#2
| ||||
| ||||
|
Just to clear this up a bit, what language are you using to write your small program, or haven't you decided yet? You should be able to remove anything enclosed in < and > symbols, which will leave you with only the text. You will need to find a way to parse the ALT=" to " text into the final output, too. How you do this depends on the language you are using.
__________________ - iMac G5 1.8GHZ 17" | SuperDrive | 160GB | 512MB | Airport Extreme | Bluetooth Keyboard & Mouse | Wacom Intuos II - Pentax *ist DL - JVC MiniDV Camcorder - Airport Express - iPod Nano 1gb white |
|
#3
| ||||
| ||||
|
Any language - whichever is the easiest. ![]() Mostly I use shell scripts, but that only gives me sed - which is a great stripping program, but really bites at stripping html. I know some PHP and PERL, so if you give me the relevent commands, I'm sure I can adapt them.
__________________ I am but a lonely shadow, Doomed forever to roam and wander. But if you allow me to pause before I must go, I'll spin you tales of mystery and wonder. Site: Night Productions |
|
#4
| ||||
| ||||
|
Perl would be the easiest for this sort of task; text processing is what it does best.
__________________ - iMac G5 1.8GHZ 17" | SuperDrive | 160GB | 512MB | Airport Extreme | Bluetooth Keyboard & Mouse | Wacom Intuos II - Pentax *ist DL - JVC MiniDV Camcorder - Airport Express - iPod Nano 1gb white |
|
#5
| ||||
| ||||
|
There are "text only" Web browsers (Lynx?) oput there. It seems to me that you can simply use one of them to get the content including the ALT tags. Also, can't you simply do a IE Save As (Format = Plain Text) to ge this?
__________________ TommyWillB Intel iMac "early 2006" core duo TommyWillB.com hosted on Mac OS X 10.5.x / Apache 2.2.x / PHP 5.x |
|
#6
| ||||
| ||||
|
Err...probably. I haven't used IE in a while. But no, that's not really an option - I'm doing this as part of a program, so I wouldn't be able to do it that way.
__________________ I am but a lonely shadow, Doomed forever to roam and wander. But if you allow me to pause before I must go, I'll spin you tales of mystery and wonder. Site: Night Productions |
|
#7
| ||||
| ||||
|
BBedit has both an AppleScript and command line interfaces... It also has a command to "Remove Markup"... I'm not sure if you can script that though...
__________________ TommyWillB Intel iMac "early 2006" core duo TommyWillB.com hosted on Mac OS X 10.5.x / Apache 2.2.x / PHP 5.x |
|
#8
| ||||
| ||||
| Re: Stripping html code out of a source file Quote:
http://www.printerport.com/klephacks/markdown.html
__________________ -- Dafuser "I picked up a Magic 8-Ball the other day and it said 'Outlook not so good'. I said 'Sure, but Microsoft still ships it.'" |
![]() |
| Bookmarks |
| Thread Tools | |
|
|
Similar Threads | ||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Can Safari export bookmarks as html file? | d54321k | Mac OS X System & Mac Software | 2 | August 25th, 2003 10:29 AM |
| I installed Fink under root and..... | Hydroglow | Unix & X11 | 5 | November 27th, 2002 04:57 PM |
| Stripping HTML tags from text files? | Gwailo | Unix & X11 | 7 | September 23rd, 2002 05:05 PM |
| How to compile stuff | fintler | Unix & X11 | 1 | August 28th, 2002 08:57 AM |
| vignette client on mac os x | erim | Software Programming & Web Scripting | 8 | July 13th, 2001 02:14 PM |