Comparing Email Filters
Provided all runs use the same test corpora, you can compare any number of
email classifiers with the mailcross command. This helps in choosing the
best combination of switches for learning, although it can't be stressed enough
that the results will depend strongly on your particular set of corpora.
It quickly gets tedious to run the cross validator multiple times, however.
With this in mind, mailcross(1) has a "testsuite" subcommand,
which runs the mailcross commands successively on any number of filters.
These filters can be various versions of dbacl with different switches, or indeed
can be other open source email classifiers.
For every email classifier you wish to cross validate, you need a wrapper script
which performs the work of the scripts mylearner.sh and myfilter.sh
in the previous section. Since every open source email classifier has its own interface,
the wrapper must translate the mailcross instructions into something that the
classifier understands. At the time of writing, wrapper scripts exist for
dbacl, bogofilter,
ifile and
spambayes.
The interface requirements are described in the mailcross(1) manual page.
Note that the supplied wrappers can be sometimes out of date for the most
popular Bayesian filters, because these projects can change their interfaces
frequently. Also, the wrappers may not use the most flattering combinations
of switches and options, as only each filter author knows the best way
to use his own filter.
Besides cross validation, you can also test Train On Error and Full Online
Ordered Training schemes, via the mailtoe(1) and mailfoot(1) commands. Using
them is very similar to using mailcross(1).
|