Why doesn't the search plugin index attachments? are there any technical reasons for not including this feature/option? (besides increased processing time, and depending from external programs.)
One could check for all non-mdwn files, convert them to text, if such thing is
possible, and add them as documents; I guess needsbuild
would be a good site
for that.
I don't think there are really any reasons, other than noone having done it.
Although it is worth noting that using additional libraries/programs to eg, pull exif data and comments out of image files and make it searchable, does potentially increase ikiwiki's attack surface.
I've modified the plugin adding the possibility of indexing attachments. Only PDF attachments for now, but support for other filetypes should be real easy to add.
The changes to
IkiWiki/Plugin/search.pm
are available at http://git.devnull.li/ikiwiki.git, in thesrchatt
branch.I have a small question about filenames and security: I'm using
qx
to execute the program that extracts the text from the PDF files, butqx
executes a whole string, and passes it not to the program I want to run, but to a shell, so it is possible (I think) to craft a filename that, in a shell, expands to something nasty.How do the Perl/IkiWiki experts suggest to handle these potentially unsafe filenames? I've thought of the following options:
Proc::Safe
. I could not find a Debian package for it, and I'd rather avoid adding another dependency to IkiWiki.perlipc
document, usingfork
+exec
.I haven't done any of those because I'd like to check if there are any helpers in IkiWiki to do this. Perhaps the
IkiWiki::possibly_foolish_untaint
function does it? (I didn't really understand what it does...)Maybe it could be sufficient to run a command similar to
?Michal, that's not a bad idea IMO, but we would lose some searching keywords and would also index structural elements (navigation text, and so on)