A dear wishlist which would resolve this question: ikiwiki should support git-annex repositories.

I am not sure how this would work, but from my POV, it should do a git annex get when new commits are pushed to its bare repo. This would assume, of course, that there's another repo somewhere that ikiwiki has access to, which works for HTTP-style remotes, but could be more problematic for SSH remotes that require a key.

Another solution would be to make ikiwiki a remote itself and allow users to push big files to it. The only problem I see with this is those files would end up in the bare repository and not necessarily show up in the web rendering. Ideally, a big file pushed would be hardlinked between the two repos, but it seems git-annex doesn't support that yet. --anarcat

One technical problem with this is that ikiwiki doesn't allow symlinks for security, but git-annex relies on symlinks (unless you're in direct mode, but I'm not sure that's really desirable here). I'd like to make symlinks possible without compromising security, but it'll be necessary to be quite careful. --smcv

First implementation

So as the discussion shows, it seems it's perfectly possible to actually do this! There's this gallery site which uses the album plugin and git-annex to manage its files.

The crucial steps are:

  1. setup a git annex remote in $srcdir

  2. configure direct mode because ikiwiki ignores symlinks for security reasons:

    cd $srcdir
    git annex init
    git annex direct
    
  3. configure files to be considered by git-annex (those will be not committed into git directly):

    git config annex.largefiles 'largerthan=100kb and not (include=*.mdwn or include=*.txt)'
    
  4. make the bare repository (the remote of $srcdir) ignored by git-annex:

    cd $srcdir
    git config remote.origin.annex-ignore true
    git config remote.origin.annex-sync false
    

    (!) This needs to be done on ANY clone of the repository, which is annoying, but it's important because we don't want to see git-annex stuff in the bare repo. (why?)

  5. deploy the following crappy plugin to make commits work again and make sure the right files are added in git-annex:

#!/usr/bin/perl
package IkiWiki::Plugin::gitannex;

use warnings;
use strict;
use IkiWiki 3.00;

sub import {
        hook(type => "getsetup", id => "gitannex", call => \&getsetup);
    hook(type => "savestate", id => "gitannex", call => \&rcs_commit);
        # we need to handle all rcs commands maybe?
}

sub getsetup () {
        return
                plugin => {
                        safe => 1, # rcs plugin
                        rebuild => undef,
                        section => "misc",
                },
}

# XXX: we want to copy or reuse safe_git

sub rcs_commit (@) {
    chdir $config{srcdir};
    `git annex add --auto`;
    `git annex sync`;
}

sub rcs_commit_staged (@) {
    rcs_commit($@);
}

1

This assumes you know what srcdir, repository and so on mean, if you forgot (like me), see this reference: git.

What doesn't work

  • the above plugin is kind of flaky and ugly.
  • it's not an RCS plugin, but probably should be, replacing the git plugin, because really: git doesn't work at all anymore at this point

What remains to be clarified

  • how do files get pushed to the $srcdir? Only through the web interface?
  • why do we ignore the bare repository?

See the discussion for a followup on that. --anarcat

Alternative implementation

An alternative implementation, which remains to be detailed but is mentionned in ikiwiki and big files, is to use the underlay feature combined with the hardlink option to deploy the git-annex'd files. Then git-annex is separate from the base ikiwiki git repo. See also Ikiwiki with git-annex, the album and the underlay plugins for an example.

Also note that ikiwiki-hosting has a patch waiting to allow pushes to work with git-annex. This could potentially be expanded to sync content to the final checkout properly, avoiding some of the problems above (esp. wrt to non-annex bare repos).

Combined with the underlay feature, this could work very nicely indeed... --anarcat

Here's an attempt:

cd /home/user
git clone source.git source.annex
cd source.annex
git annex direct
cd ../source.git
git annex group . transfer
git remote add annex ../source.annex
git annex sync annex

Make sure the hardlink setting is enabled, and add the annex as an underlay, in ikiwiki.setup:

hardlink: 1
add_underlays:
- /home/w-anarcat/source.annex

Then moving files to the underlay is as simple as running this command in the bare repo:

#!/bin/sh

echo "moving big files to annex repository..."
git annex move --to annex

I have added this as a hook in $HOME/source.git/hooks/post-receive (don't forget to chmod +x).

The problem with the above is that the underlay wouldn't work: for some reason it wouldn't copy those files in place properly. Maybe it's freaking out because it's a full copy of the repo... My solution was to make the source repository itself a direct repo, and then add it as a remote to the bare repo. --anarcat

Back from the top

Obviously, the final approach of making the source repository direct mode will fail because ikiwiki will try to commit files there from the web interface which will fail (at best) and (at worst) add big files into git-annex (or vice-versa, not sure what's worse actually).

Also, I don't know how others here made the underlay work, but it didn't work for me. I think it's because in the "source" repository, there are (dead) symlinks for the annexed files. This overrides the underlay, because of security - although I am unclear as to why this is discarded so early. So in order to make the original idea above work properly (ie. having a separate git-annex repo in direct mode) work, we must coerce ikiwiki into tolerating symlinks in the srcdir a little more:

diff --git a/IkiWiki.pm b/IkiWiki.pm
index 1043ef4..949273c 100644
--- a/IkiWiki.pm
+++ b/IkiWiki.pm
@@ -916,11 +916,10 @@ sub srcfile_stat {
        my $file=shift;
        my $nothrow=shift;

-       return "$config{srcdir}/$file", stat(_) if -e "$config{srcdir}/$file";
-       foreach my $dir (@{$config{underlaydirs}}, $config{underlaydir}) {
-               return "$dir/$file", stat(_) if -e "$dir/$file";
+       foreach my $dir ($config{srcdir}, @{$config{underlaydirs}}, $config{underlaydir}) {
+               return "$dir/$file", stat(_) if (-e "$dir/$file" && ! -l "$dir/$file");
        }
-       error("internal error: $file cannot be found in $config{srcdir} or underlay") unless $nothrow;
+       error("internal error: $file cannot be found in $config{srcdir} or underlays @{$config{underlaydirs}} $config{underlaydir}") unless $nothrow;
        return;
 }

diff --git a/IkiWiki/Render.pm b/IkiWiki/Render.pm
index 9d6f636..e0b4cf8 100644
--- a/IkiWiki/Render.pm
+++ b/IkiWiki/Render.pm
@@ -337,7 +337,7 @@ sub find_src_files (;$$$) {

                if ($underlay) {
                        # avoid underlaydir override attacks; see security.mdwn
-                       if (! -l "$abssrcdir/$f" && ! -e _) {
+                       if (1 || ! -l "$abssrcdir/$f" && ! -e _) {
                                if (! $pages{$page}) {
                                        push @files, $f;
                                        push @IkiWiki::underlayfiles, $f;

Now obviously this patch is incomplete: I am not sure we actually avoid the attack, ie. i am not sure the check in srcdir() is sufficient to remove completely the check in find_src_files().

After reviewing the code further, it seems that find_src_files in three places in ikiwiki:

../IkiWiki/Render.pm:421:   find_src_files(1, \@files, \%pages);
../IkiWiki/Render.pm:846:       ($files, $pages)=find_src_files();
../po/po2wiki:18:my ($files, $pages)=IkiWiki::find_src_files();

The first occurence is in IkiWiki::Render::process_changed_files, where it is used mostly for populating @IkiWiki::underlayfiles, the only side effect of find_src_files. The second occurence is in IkiWiki::Render::refresh. There things are a little more complicated (to say the least) and a lot of stuff happens. To put it in broad terms, first it does a IkiWiki::Render::scan and then a IkiWiki::Render::render. The last two call srcfile() appropriately (where i put an extra symlink check), except for will_render() in scan, which I can't figure out right now and that seems to have a lot of global side effects. It still looks fairly safe at first glance. The rcs_get_current_rev, refresh, scan and rendered hooks are also called in there, but I assume those to be safe, since they are called with sanitized values already.

The patch does work: the files get picked up from the underlay and properly hardlinked into the target public_html directory! So with the above patch, then the following hook in source.git/hooks/post-receive:

#!/bin/sh

OLD_GIT_DIR="$GIT_DIR"
unset GIT_DIR
echo "moving big files to annex repository..."
git annex copy --to annex
git annex sync annex

(I am not sure anymore why GIT_DIR is necessary, but I remember it destroyed all files in my repo because git-annex synced against the setup branch in the parent directory. fun times.)

Then the annex repo is just a direct clone of the source.git:

cd /home/user
git clone --shared source.git annex
cd annex
git annex direct
cd ../source.git
git remote add annex ../annex

And we need the following config:

hardlink: 1
add_underlays:
- /home/w-anarcat/annex
add_plugins:
- underlay

... and the ikiwiki-hosting patch mentionned earlier to allow git-annex-shell to run at all. Also, the --shared option will make git-annex use hardlinks itself between the two repos, so the files will be available for download as well. --anarcat

...aaaand this doesn't work anymore. :( i could have sworn this was working minutes ago, but for some reason the annexed files get skipped again now. :( Sorry for the noise, the annex repo wasn't in direct mode - the above works! --anarcat

This patch still applies - anything else I should be doing here to try to get this fixed? A summary maybe? --anarcat

Sorry, I don't have the mental bandwidth at the moment to work through the implications of this change. I know you want this feature, I know it's an attractive solution to several use cases, and git annex support is in the queue, but at right now I'm still trying to deal with mitigating CVE-2016-3714, and the last thing I want to do is merge new security risks. --smcv

No problem at all, glad that you still have that in the queue, and I hope my work was somewhat useful in pushing this forward! Thanks for taking care of the Imagetragick situation... :/ --anarcat

Yet another option

If you happen to use rsync to deploy, instead of git, this is much easier.