An Example: Running the Tests
Before you cross validate, make sure you have ample disk space available.
As a rough rule, expect to require up to 20 times the combined size of your
$HOME/sample_*.mbox files if you do the following.
% mailcross prepare 10
% mailcross add spam $HOME/sample_spam.mbox
% mailcross add notspam $HOME/sample_notspam.mbox
Note that if you have several mbox files with spam, you can repeat the
add spam command several times with each mbox file. All this command
does is merge the contents of the mbox file into a specially created directory
named maicross.d. Once this is done, you don't need the original
*.mbox files around any longer, at least for cross validation purposes.
You are now ready to select the classifiers you wish to compare. Type
% mailcross testsuite list
The following classification wrappers are selectable:
annoyance-filter - Annoyance Filter 1.0b with prune
antispam - AntiSpam 1.1 with default options
bmf - bmf 0.9.4 with default options
bogofilterA - bogofilter 0.15.7 with Robinson-Fischer algorithm
bogofilterB - bogofilter 0.15.7 with Graham algorithm
bogofilterC - bogofilter 0.15.7 with Robinson algorithm
crm114A - crm114 20031129-RC11 with default settings
crm114B - crm114 20031129-RC11 with Jaakko Hyvatti's normalizemime
dbaclA - dbacl 1.6 with alpha tokens
dbaclB - dbacl 1.6 with cef,headers,alt,links
dbaclC - dbacl 1.6 with alpha tokens and risk matrix
ifile - ifile 1.3.3 with to,from,subject headers and default tokens
popfile - POPFile (unavailable?) with default options
qsf - qsf 0.9.4 with default options
spamassassin - SpamAssassin 2.60 (Bayes module) with default settings
spambayes - SpamBayes x with default settings
spamoracle - SpamOracle x with default settings
spamprobe - SpamProbe v0.9e with default options
The exact list you see depends on the classifiers installed on your system.
If a classifier is marked unavailable, you must first download and install
it somewhere in your path. Once this is done, select the classifiers you
are going to test, for example:
% mailcross testsuite select dbaclB bogofilterA annoyance-filter
Note that some of these only work with two categories spam and notspam. You can see the state of the testsuite by typing:
% mailcross testsuite status
The following categories are to be cross validated:
notspam.mbox - counting... 2500 messages
spam.mbox - counting... 500 messages
Cross validation is performed on each of these classifiers:
annoyance-filter - Annoyance Filter 1.0b with prune
bogofilterA - bogofilter 0.15.7 with Robinson algorithm
dbaclB - dbacl 1.5 with cef,headers,alt,links
Finally, to start the test, type
% mailcross testsuite run
The cross validation may take a long time, depending on the classifier and the
number of messages. You can check progress by keeping an eye on the log
files in the directory mailcross.d/log/
|