SourceForge.net Logo
Summary
Forums
CVS
Download

Laird Breyer
Download
contents introduction tutorial spam fun man related
previous next

An Example: Viewing The Results

Once the cross validation test has completed, you can see the results as follows:

% mailcross testsuite summarize

Each selected classifier is scored in two complementary ways.

The first question asked is Where do misclassifications go?, which shows roughly how good the predictions are from an objective standpoint.

The percentage of notspam messages predicted as spam is sometimes called the false negative rate. The percentage of spam messages predicted as notspam is sometimes called the false positive rate. This terminology is however not standardized and confusing (as it depends on the purpose of the test) and won't be used here.

The second question asked is What is really in each category after prediction?, which is really a dual form of the previous question.

Normally, the purpose of mail classification is to separate your messages so that you save time. Here you can see how "clean" your mailboxes would be after classification.

Here is a sample summary output by mailcross(1) testsuite. Remember that results such as these make no sense unless you try them out on your own emails. You have no idea what emails were used to obtain these results, and I am not going to tell you.

---------------
Annoyance Filter 1.0b with prune
Fri Nov 14 11:26:58 EST 2003
---------------
Where do misclassifications go?

  true     | but predicted as...
    *      |    notspam      spam
notspam    |    100.00%     0.00%
spam       |      9.40%    90.60%

What is really in each category after prediction?

category   | contains mixture of...
    *      |    notspam      spam
notspam    |     98.15%     1.85%
spam       |      0.00%   100.00%

---------------
bogofilter 0.15.7 with Robinson algorithm
Fri Nov 14 11:30:25 EST 2003
---------------
Where do misclassifications go?

  true     | but predicted as...
    *      |    notspam      spam
notspam    |    100.00%     0.00%
spam       |      8.40%    91.60%

What is really in each category after prediction?

category   | contains mixture of...
    *      |    notspam      spam
notspam    |     98.35%     1.65%
spam       |      0.00%   100.00%

---------------
dbacl 1.5 with cef,headers,alt,links
Fri Nov 14 11:33:33 EST 2003
---------------
Where do misclassifications go?

  true     | but predicted as...
    *      |    notspam      spam
notspam    |    100.00%     0.00%
spam       |      5.80%    94.20%

What is really in each category after prediction?

category   | contains mixture of...
    *      |    notspam      spam
notspam    |     98.85%     1.15%
spam       |      0.00%   100.00%
previous next
contents introduction tutorial spam fun man related