SourceForge.net Logo
Summary
Forums
CVS
Download

Laird Breyer
Download
contents introduction tutorial spam fun man related
previous next

Comparing Email Filters

Provided all runs use the same test corpora, you can compare any number of email classifiers with the mailcross command. This helps in choosing the best combination of switches for learning, although it can't be stressed enough that the results will depend strongly on your particular set of corpora.

It quickly gets tedious to run the cross validator multiple times, however. With this in mind, mailcross(1) has a "testsuite" subcommand, which runs the mailcross commands successively on any number of filters. These filters can be various versions of dbacl with different switches, or indeed can be other open source email classifiers.

For every email classifier you wish to cross validate, you need a wrapper script which performs the work of the scripts mylearner.sh and myfilter.sh in the previous section. Since every open source email classifier has its own interface, the wrapper must translate the mailcross instructions into something that the classifier understands. At the time of writing, wrapper scripts exist for dbacl, bogofilter, ifile and spambayes. The interface requirements are described in the mailcross(1) manual page.

Note that the supplied wrappers can be sometimes out of date for the most popular Bayesian filters, because these projects can change their interfaces frequently. Also, the wrappers may not use the most flattering combinations of switches and options, as only each filter author knows the best way to use his own filter.

Besides cross validation, you can also test Train On Error and Full Online Ordered Training schemes, via the mailtoe(1) and mailfoot(1) commands. Using them is very similar to using mailcross(1).

previous next
contents introduction tutorial spam fun man related