So I have enabled the moderatedcomments plugin on my wiki. and good thing that! around 1000 spammy comments showed up in the last 3 months! Awful!

It's pretty hard to figure out the ham and the spam in there. One thing I was hoping was to use the power of the commandline to filter through all that stuff. Now, it seems there's only a "ikiwiki-comment" tool now, and nothing to examine the moderated comments.

It seems to me it would be great to have some tool to filter through that...

So it turns out it was over 3000 comments. The vast majority of those (every one but 42 comments) were from the IP 46.161.41.34 which i recommend null-routing everywhere. I used the following shell magic to figure that out:

#!/bin/sh

set -e

cd .ikiwiki/transient || {
    echo could not find comments, make sure you are in a ikiwiki source directory.
    exit 1
    }
# count the number of comments
echo found $(find . -name '*._comment_pending' | wc -l) pending comments
# number of comments per IP
echo IP distribution:
find . -name '*._comment_pending' | xargs grep -h ip= | sort | uniq -c | sort -n
# generate a banlist for insertion in `banusers`, assuming all the
# pending comments are spam
echo banlist would look like:
find . -name '*._comment_pending' | xargs grep -h ip= | sort -u| sed 's/ ip="//;s/"//;s/^/- ip(/;s/$/)/'

echo to remove comments from a specific IP, use one of those:
find . -name '*._comment_pending' | xargs grep -h ip= | sort -u \
    | sed 's/ ip="//;s/"//;' \
    | while read ip; do
          echo "find . -name '*._comment_pending' | xargs grep -l 'ip=\"$ip\"'| xargs rm"
      done
echo to flush all pending comments, use:
echo "find . -name '*._comment_pending' -delete"

The remaining 42 comments I reviewed throught the web interface, then flushed using the above command. My final addition to the banlist is:

- ip(159.224.160.225)
- ip(176.10.104.227)
- ip(176.10.104.234)
- ip(188.143.233.211)
- ip(193.201.227.41)
- ip(195.154.181.152)
- ip(213.238.175.29)
- ip(31.184.238.11)
- ip(37.57.231.112)
- ip(37.57.231.204)
- ip(46.161.41.34)
- ip(46.161.41.199)
- ip(95.130.13.111)
- ip(95.181.178.142)

--anarcat

Update: i made a script, above. And the banlist is much larger now so the above list is pretty much out of date now... --anarcat

Another update, 2020: I rewrote the script to support interactive batch approval and running from a cron job. I have published the script in my own repo since it's python (and not perl), but I would be happy to provide it as a patch here if that's acceptable.

The basic usage is as follows. First, you add the script in a cron job:

9 * * * * /home/anarcat/bin/ikiwiki-comment-moderation --source-dir=$HOME/source/

This will run every hour. When there are no comments to moderate, the script is silent and you will not get mail. Otherwise you will get something like this:

date                    ip               user            subject    content
2020-05-27T03:42:05Z    192.168.0.116    spammer name    subject    spammy comment
1 comments pending moderation

Then you can either go through the web interface to approve/deny the comments, or call the script by hand, interactively, for example:

w-anarcat@marcos:~/source$ ~anarcat/bin/ikiwiki-comment-moderation --source-dir=$HOME/source -i --verbose
Date : 2020-05-27T04:00:23Z
Ip : 192.168.0.116         
Claimedauthor : spammer name
Subject : subject          
Content : spammy comment   
approve, delete, ignore? [a/d/I] a
moving /home/w-anarcat/source/.ikiwiki/transient/blog/2020-04-27-drowning-camera/comment_1_07f43231a14d0ee6e78d1030aa6a7985._comment_pending to /home/w-anarcat/source/blog/2020-04-27-drowning-camera/comment_1_07f43231a14d0ee6e78d1030aa6a7985._comment
adding to git...           
[master 8f5cb10f] approve comment
 1 file changed, 9 insertions(+)
 create mode 100644 blog/2020-04-27-drowning-camera/comment_1_07f43231a14d0ee6e78d1030aa6a7985._comment
Énumération des objets: 8, fait.
Décompte des objets: 100% (8/8), fait.
Compression par delta en utilisant jusqu'à 12 fils d'exécution
Compression des objets: 100% (5/5), fait.
Écriture des objets: 100% (5/5), 566 bytes | 566.00 KiB/s, fait.
Total 5 (delta 3), réutilisés 0 (delta 0)
To /home/w-anarcat/source.git
   91669038..8f5cb10f  master -> master
approved comment

And you're done! In the above case, the test comment was actually approved (by pressing a), but you can also hit d to just delete the comment. The default (i) is to ignore the comment.

I find that this is generally faster than going through a web browser, although to be as fast as the CGI interface, there would need to be a final dialog that says "delete all ignored comments" like in the CGI. Exercise for the reader or, I guess, myself when I got too many junk comments to process...

Feedback, as usual, welcome. -- anarcat

great stuff, thanks! I've got a similar-ish script that sends me an email summary, but I don't have the CLI stuff myself. What I'd like to see on the web form is: display the IP (or user) responsible for each comment and enable banning them at the same time as moderating the comment. However I'd also like to expand the data attached to the banlist, so it's clear why an entry was added (e.g. blog spam), and when — because I would prefer all such bans to be time limited. (implementation wise, this might be easiest achieved if one of the comment plugins was responsible for maintaining a separate data structure with this info which was processed into a ban list to append to the main one.) Jon, 2020-11-02