Batch HTML converting

Profit

Registered
Hi there, new to the boards. I apologize if there is an obvious solution to my problem that hasn't occurred to me.

I'm writing a perl script that extracts data from HTML files. I plan to post a few perl questions in another topic.

The HTML files are already in a standard format, but not one conducive to my planned method of extraction.

Omniweb's 'reformat' button does a fine job for my purposes.

Eg. I'd like it to take the following:
<p><b>Area:</b>
<br><i>total:</i>
652,000 sq km
<br><i>land:</i>
652,000 sq km
<br><i>water:</i>
0 sq km

and turn it into:
<b>Area:</b> <br>
<i>total:</i> 652,000 sq km <br>
<i>land:</i> 652,000 sq km <br>
<i>water:</i> 0 sq km

This way, I can create a big array of strings before and after the desired information and run something along these lines:
if ($line =~ s/$searchy[$itr]//) {
substr($line, -length($choppy[$itr])-2) = "";
print "<$taggy[$itr]>$line<\\$taggy[$itr]>\n";
}

If anyone knows of a CLI (or otherwise for that matter) program that could batch process HTML formatting, I'd be greatly appreciative.

I apologize for being long winded and appreciate your help.

kdavis@uvic.ca

edit: First post and I didn't realize this thing displayed HTML.. check the source if you can help. Thanks.
 
Back
Top