Hi there, new to the boards. I apologize if there is an obvious solution to my problem that hasn't occurred to me.
I'm writing a perl script that extracts data from HTML files. I plan to post a few perl questions in another topic.
The HTML files are already in a standard format, but not one conducive to my planned method of extraction.
Omniweb's 'reformat' button does a fine job for my purposes.
Eg. I'd like it to take the following:
<p><b>Area:</b>
<br><i>total:</i>
652,000 sq km
<br><i>land:</i>
652,000 sq km
<br><i>water:</i>
0 sq km
and turn it into:
<b>Area:</b> <br>
<i>total:</i> 652,000 sq km <br>
<i>land:</i> 652,000 sq km <br>
<i>water:</i> 0 sq km
This way, I can create a big array of strings before and after the desired information and run something along these lines:
if ($line =~ s/$searchy[$itr]//) {
substr($line, -length($choppy[$itr])-2) = "";
print "<$taggy[$itr]>$line<\\$taggy[$itr]>\n";
}
If anyone knows of a CLI (or otherwise for that matter) program that could batch process HTML formatting, I'd be greatly appreciative.
I apologize for being long winded and appreciate your help.
kdavis@uvic.ca
edit: First post and I didn't realize this thing displayed HTML.. check the source if you can help. Thanks.
I'm writing a perl script that extracts data from HTML files. I plan to post a few perl questions in another topic.
The HTML files are already in a standard format, but not one conducive to my planned method of extraction.
Omniweb's 'reformat' button does a fine job for my purposes.
Eg. I'd like it to take the following:
<p><b>Area:</b>
<br><i>total:</i>
652,000 sq km
<br><i>land:</i>
652,000 sq km
<br><i>water:</i>
0 sq km
and turn it into:
<b>Area:</b> <br>
<i>total:</i> 652,000 sq km <br>
<i>land:</i> 652,000 sq km <br>
<i>water:</i> 0 sq km
This way, I can create a big array of strings before and after the desired information and run something along these lines:
if ($line =~ s/$searchy[$itr]//) {
substr($line, -length($choppy[$itr])-2) = "";
print "<$taggy[$itr]>$line<\\$taggy[$itr]>\n";
}
If anyone knows of a CLI (or otherwise for that matter) program that could batch process HTML formatting, I'd be greatly appreciative.
I apologize for being long winded and appreciate your help.
kdavis@uvic.ca
edit: First post and I didn't realize this thing displayed HTML.. check the source if you can help. Thanks.