getting Warnings about UTF8-Chars.

I'm getting multiple warnings:

  utf8 "\xAB" does not map to Unicode at /usr/share/perl5/IkiWiki.pm line 774, <$in> chunk 1.

I'm assuming this is once per File, but even in verbose mode, it doesn't tell me which file is a problem. It first reads all the files, and afterwards when parsing/compiling them, it outputs the warning, so I can't deduce the offending files.

Is there a way to have ikiwiki output the position, where it encounters the character?

Probably all this has to do with locale-settings, and usage of mixed locales in a distributed setup ... I'd rather cleanup some of the file(name)s of unexpected characters. --jwalzer


Update : So I took the chance to insert debug into ikiwiki.pm:

 root@novalis:/usr/share/perl5# diff -p /tmp/IkiWiki.orig.pm IkiWiki.pm 
 *** /tmp/IkiWiki.orig.pm        Sun Feb 14 15:16:08 2010
 --- IkiWiki.pm  Sun Feb 14 15:16:28 2010
 *************** sub readfile ($;$$) {
 *** 768,773 ****
 --- 768,774 ----
         }

         local $/=undef;
 +       debug("opening File: $file:");
         open (my $in, "<", $file) || error("failed to read $file: $!");
         binmode($in) if ($binary);
         return \*$in if $wantfd;

But what I see now is not quite helpful, as it seems, STDERR and DEBUG are asyncronous, so they mix up in a way, that I can't really see, whats the problem ... Maybe I'm better off for troubleshooting, to insert an printf to strerr to have it in the same stream.. --jwalzer


Update: The "print STDERR $file;"-Trick did it .. I was able to find a mdwn-file, that (was generated by a script of me) had \0xAB in it.

Nevertheless I still wonder if this should be a problem. This character happend to be in an [[meta title='$CHAR']]-tag and an [$CHAR]http://foo)-Link

Should this throw an warning? Maybe this warning could be catched an reported inclusively the containing filename? maybe even with an override, if one knows that it is correct that way? --jwalzer