SourceForge.net Logo
Summary
Forums
CVS
Download

Laird Breyer
Download
contents introduction tutorial spam fun man related
previous next

Basic operation: Scripts

There is generally little point in running the commands above by hand, except if you want to understand how dbacl(1) operates, or want to experiment with switches.

Note, however, that simple scripts often do not check for error and warning messages on STDERR. It is always worth rehearsing the operations you intend to script, as dbacl(1) will let you know on STDERR if it encounters problems during learning. If you ignore warnings, you will likely end up with suboptimal classifications, because the dbacl system prefers to do what it is told predictably, rather than stop when an error condition occurs.

Once you are ready for spam filtering, you need to handle two issues.

The first issue is when and how to learn.

You should relearn your categories whenever you've received an appreciable number of emails or whenever you like. Unlike other spam filters, dbacl cannot learn new emails incrementally and update its category files. Instead, you must keep your messages organized and dbacl(1) will take a snapshot.

This limitation is actually advantageous in the long run, because it forces you to keep usable archives of your mail and gives you control over every message that is learned. By contrast, with incremental learning you must remember which messages have already been learned, how many times, and whether to unlearn them if you change your mind.

A dbacl category model normally doesn't change dramatically if you add a single new email (provided the original model depends on more than a handful of emails). Over time, you can even stop learning altogether when your error rate is low enough. The simplest strategy for continual learning is a cron(1) job run once a day:

% crontab -l > existing_crontab.txt

Edit the file existing_crontab.txt with your favourite editor and add the following three lines at the end:

CATS=$HOME/.dbacl
5 0 * * * dbacl -T email -H 18 -l $CATS/spam $HOME/mail/notspam
10 0 * * * dbacl -T email -H 18 -l $CATS/notspam $HOME/mail/notspam

Now you can install the new crontab file by typing

% crontab existing_crontab.txt

The second issue is how to invoke and what to do with the dbacl classification.

Many UNIX systems offer procmail(1) for email filtering. procmail(1) can pipe a copy of each incoming email into dbacl(1), and use the resulting category name to write the message directly to the appropriate mailbox.

To use procmail, first verify that the file $HOME/.forward exists and contains the single line:

|/usr/bin/procmail

Next, create the file $HOME/.procmailrc and make sure it contains something like this:

PATH=/bin:/usr/bin:/usr/local/bin
SHELL=/bin/bash
MAILDIR=$HOME/mail
DEFAULT=$MAILDIR/inbox

#
# this line runs the spam classifier
#
:0 
YAY=| dbacl -vT email -c $HOME/.dbacl/spam -c $HOME/.dbacl/notspam

#
# this line writes the email to your mail directory
#
:0:
* ? test -n "$YAY"
# if you prefer to write the spam status in a header,
# comment out the first line and uncomment the second
$MAILDIR/$YAY
#| formail -A "X-DBACL-Says: $YAY" >>$DEFAULT

#
# last rule: put mail into mailbox
#
:0:
$DEFAULT

The above script will automatically file your incoming email into one of two folders named $HOME/mail/spam and $HOME/mail/notspam respectively (if you have a POP account, and your mailreader contacts your ISP directly, this won't work. Try using fetchmail(1)).

previous next
contents introduction tutorial spam fun man related