Laird Breyer
contents introduction tutorial spam fun man related
previous next

An Example: Running the Tests

Before you cross validate, make sure you have ample disk space available. As a rough rule, expect to require up to 20 times the combined size of your $HOME/sample_*.mbox files if you do the following.

% mailcross prepare 10
% mailcross add spam $HOME/sample_spam.mbox
% mailcross add notspam $HOME/sample_notspam.mbox

Note that if you have several mbox files with spam, you can repeat the add spam command several times with each mbox file. All this command does is merge the contents of the mbox file into a specially created directory named maicross.d. Once this is done, you don't need the original *.mbox files around any longer, at least for cross validation purposes.

You are now ready to select the classifiers you wish to compare. Type

% mailcross testsuite list
The following classification wrappers are selectable:

annoyance-filter - Annoyance Filter 1.0b with prune
antispam - AntiSpam 1.1 with default options
bmf - bmf 0.9.4 with default options
bogofilterA - bogofilter 0.15.7 with Robinson-Fischer algorithm
bogofilterB - bogofilter 0.15.7 with Graham algorithm
bogofilterC - bogofilter 0.15.7 with Robinson algorithm
crm114A - crm114 20031129-RC11 with default settings
crm114B - crm114 20031129-RC11 with Jaakko Hyvatti's normalizemime
dbaclA - dbacl 1.6 with alpha tokens
dbaclB - dbacl 1.6 with cef,headers,alt,links
dbaclC - dbacl 1.6 with alpha tokens and risk matrix
ifile - ifile 1.3.3 with to,from,subject headers and default tokens
popfile - POPFile (unavailable?) with default options
qsf - qsf 0.9.4 with default options
spamassassin - SpamAssassin 2.60 (Bayes module) with default settings
spambayes - SpamBayes x with default settings
spamoracle - SpamOracle x with default settings
spamprobe - SpamProbe v0.9e with default options

The exact list you see depends on the classifiers installed on your system. If a classifier is marked unavailable, you must first download and install it somewhere in your path. Once this is done, select the classifiers you are going to test, for example:

% mailcross testsuite select dbaclB bogofilterA annoyance-filter

Note that some of these only work with two categories spam and notspam. You can see the state of the testsuite by typing:

% mailcross testsuite status
The following categories are to be cross validated:

notspam.mbox - counting...    2500 messages
spam.mbox - counting...     500 messages

Cross validation is performed on each of these classifiers:

annoyance-filter - Annoyance Filter 1.0b with prune
bogofilterA - bogofilter 0.15.7 with Robinson algorithm
dbaclB - dbacl 1.5 with cef,headers,alt,links

Finally, to start the test, type

% mailcross testsuite run

The cross validation may take a long time, depending on the classifier and the number of messages. You can check progress by keeping an eye on the log files in the directory mailcross.d/log/

previous next
contents introduction tutorial spam fun man related