Why doesn't the search plugin index attachments? are there any technical reasons for not including this feature/option? (besides increased processing time, and depending from external programs.)

One could check for all non-mdwn files, convert them to text, if such thing is possible, and add them as documents; I guess needsbuild would be a good site for that.


I don't think there are really any reasons, other than noone having done it.

Although it is worth noting that using additional libraries/programs to eg, pull exif data and comments out of image files and make it searchable, does potentially increase ikiwiki's attack surface.

Comment by joey Fri Jan 13 13:46:49 2012
RE: comment 1

I've modified the plugin adding the possibility of indexing attachments. Only PDF attachments for now, but support for other filetypes should be real easy to add.

The changes to IkiWiki/Plugin/search.pm are available at http://git.devnull.li/ikiwiki.git, in the srchatt branch.

I have a small question about filenames and security: I'm using qx to execute the program that extracts the text from the PDF files, but qx executes a whole string, and passes it not to the program I want to run, but to a shell, so it is possible (I think) to craft a filename that, in a shell, expands to something nasty.

How do the Perl/IkiWiki experts suggest to handle these potentially unsafe filenames? I've thought of the following options:

  • Running the text extractor program using Proc::Safe. I could not find a Debian package for it, and I'd rather avoid adding another dependency to IkiWiki.
  • Running the text extractor program as suggested in the perlipc document, using fork + exec.

I haven't done any of those because I'd like to check if there are any helpers in IkiWiki to do this. Perhaps the IkiWiki::possibly_foolish_untaint function does it? (I didn't really understand what it does...)

Comment by jerojasro Sun Jan 15 19:49:49 2012

Maybe it could be sufficient to run a command similar to

omindex --db /path/to/.ikiwiki/xapian/default --url http://webserver/ikiwiki /path/to/public_html
Comment by Michal Tue Jan 17 12:45:37 2012
RE: comment 1

?Michal, that's not a bad idea IMO, but we would lose some searching keywords and would also index structural elements (navigation text, and so on)

Comment by jerojasro Sat Jan 21 21:44:00 2012