Perl and UTF-8/Unicode RegEx

kainjow

Registered
I've got a Perl script that simply parses out HTML from the standard input, and then outputs the result. However, for UTF-8/Unicode text (still not 100% clear on the difference between these encodings...), the output is all garbled. Anyone have any ideas?

Here's the Perl code:
Code:
#!/usr/bin/perl
$str = "";
while ($line=<STDIN>)
{
	$str .= $line;
}
$str =~ s/<script[^>]*>(.*?)<\/script>//gsi; # remove <script>
$str =~ s/<(?:[^>'"]*|(['"]).*?)*>//gsi; # remove html
print $str;
 
Back
Top