mailcross.testsuite is closely tied to
mailcross(1), which must be invoked directly for some
of the steps.
Before you can cross validate any of your email
classifiers, you will need two sets of email messages in
mbox format. One set can be full of junk mail (assumed
henceforth to reside in the file
$HOME/mail/junk.mbox), another can be full of
ordinary mail (assumed to reside in the file
$HOME/mail/good.mbox). It is normally important to
keep the two types of email messages entirely separate, as
mixing message types can impact classifier performance.
The first step in the cross validation is to create the
necessary infrastructure. To do so, make sure you have
plenty of disk space (about 10 times the combined size of
both mbox files), and type the following:
% mailcross prepare 10
% mailcross add spam $HOME/mail/junk.mbox
% mailcross add notspam $HOME/mail/good.mbox
This will create a directory named mailcross.d
which contains copies of your mail messages ready for use.
The original mbox files are no longer needed or referenced.
Next, you must choose which email classifiers to test. Every
such classifier is called through a wrapper script by
mailcross.testsuite, and you can view a list of
available wrappers by typing:
% mailcross.testsuite list
Note that the wrapper scripts are NOT the actual email
classifiers, which must be installed separately by your
system administrator or otherwise. Once this is done, you
can select one or more wrappers for the cross validation by
typing, for example:
% mailcross.testsuite select dbacl ifile
If some of the selected classifiers cannot be found on
the system, they are not selected. Note also that some
wrappers can have hard-coded category names, e.g. if the
classifier only supports binary classification. Heed the
warning messages.
It remains only to run the cross validation. Beware, this
can take a long time (several hours depending on the
classifier).
% mailcross.testsuite run
% mailcross.testsuite summarize
Once you are all done cross validating, you can delete
the working files, log files etc. by typing
% mailcross clean
The progress of the cross validation is written silently
in various log files which are located in the
mailcross.d/log directory. Check these in case of
problems.
|