I converted an ikiwiki setup file to YAML as documented.
On my Debian Squeeze system, attempting to build the wiki using the YAML setup file triggers the following error message:
YAML::XS::Load Error: The problem: Invalid trailing UTF-8 octet was found at document: 0 usage: ikiwiki [options] source dest ikiwiki --setup configfile
Indeed, my setup file contains UTF-8 characters.
Deinstalling YAML::XS (libyaml-libyaml-perl) resolves this issue. According to YAML::Any's POD, YAML::Syck is used instead of YAML::XS in this case since it's the best YAML implementaion available on my system.
No encoding-related setting is mentionned in YAML::XS' POD. We may consider there is a bug in there. I'll see if it's known / fixed somewhere as soon as I get online.
Joey, as a (hopefully) temporary workaround, what do you think of explicitely using YAML::Syck (or whatever other YAML implementation that does not expose this bug) rather than letting YAML::Any pick its preferred one?
libyaml-syck-perl's description mentions that the module is now deprecated. (I had to do some ugly workaround to make unicode work with Syck earlier.) So it appears the new YAML::Xs is the way to go longterm, and presumably YAML::Any will start depending on it in due course? --Joey
Right. Since this bug is fixed in current testing/sid, only Squeeze needs to be taken care of. As far as Debian Squeeze is concerned, I see two ways out of the current buggy situation:
Conflicts: libyaml-libyaml-perl (< 0.34-1~)to the ikiwiki packages uploaded to stable and squeeze-backports. Additionally uploading the newer, fixed
libyaml-libyaml-perlto squeeze-backports would make the resulting situation a bit easier to deal with from the Debian stable user point of view.
- Patch the ikiwiki packages uploaded to stable and squeeze-backports:
- either to workaround the bug by explicitly using YAML::Syck (yeah, it's deprecated, but it's Debian stable)
- or to make the bug easier to workaround by the user, e.g. by warning her of possible problems in case YAML::Any has chosen YAML::XS as its preferred implementation (the
YAML::Any->implementationmodule method can come in handy in this case).
I tend to prefer the first aforementioned solution, but any of these will anyway be kinda ugly, so...
With the additional info and test cases I provided on the Debian bug (Message #22), I now doubt this is a YAML::XS bug very much. Also, the RT bug I linked to happens with
use utf8, which is not the case in ikiwiki AFAIK => I think you shall reconsider whether this bug really is YAML::XS' fault, or YAML::Any's fault, or Perl's fault, or... the way ikiwiki slurps and untaints UTF-8 YAML setup files. Sorry for providing information that may have been misguided. --intrigeri
use utf8is completely irrelevant; that only tells perl to support utf8 in its source code.
I don't know what
Path::Class::Fileis, but if it provides non-decoded bytes to the module than it would likely avoid this failure, while resulting in parsed yaml where every string was likewise not decoded unicode, which is not very useful. --Joey
You guessed right about the non-decoded bytes being passed to YAML::XS, except this is the way it shall be done. YAML::XS POD reads: "YAML::XS only deals with streams of utf8 octets". Feed it with non-decoded UTF-8 bytes and it gives you properly encoded UTF-8 Perl strings in exchange.
Once this has been made clear, since 1. this module indeed seems to be the future of YAML in Perl, and 2. is depended on by other popular software such as dh-make-perl (on the 2nd degree), I suggest using it explicitly instead of the current "try to support every single YAML Perl module and end up conflicting with the now recommended one" nightmare. --intrigeri
Thanks a lot. --intrigeri