An Example: Viewing The Results
Once the cross validation test has completed, you can see the
results as follows:
% mailcross testsuite summarize
Each selected classifier is scored in two complementary ways.
The first question asked is Where do misclassifications go?,
which shows roughly how good the predictions are from an objective standpoint.
The percentage of notspam messages predicted as spam is sometimes
called the false negative rate. The percentage of spam messages predicted as notspam is sometimes called the false positive rate. This terminology is however not standardized and confusing (as it depends on the purpose of the test) and won't be used here.
The second question asked is What is really in each category after prediction?, which is really a dual form of the previous question.
Normally, the purpose of mail classification is to separate your messages so that you save time. Here you can see how "clean" your mailboxes would be after classification.
Here is a sample summary output by mailcross(1) testsuite. Remember that results such as these make no sense unless you try them
out on your own emails. You have no idea what emails were used to obtain
these results, and I am not going to tell you.
---------------
Annoyance Filter 1.0b with prune
Fri Nov 14 11:26:58 EST 2003
---------------
Where do misclassifications go?
true | but predicted as...
* | notspam spam
notspam | 100.00% 0.00%
spam | 9.40% 90.60%
What is really in each category after prediction?
category | contains mixture of...
* | notspam spam
notspam | 98.15% 1.85%
spam | 0.00% 100.00%
---------------
bogofilter 0.15.7 with Robinson algorithm
Fri Nov 14 11:30:25 EST 2003
---------------
Where do misclassifications go?
true | but predicted as...
* | notspam spam
notspam | 100.00% 0.00%
spam | 8.40% 91.60%
What is really in each category after prediction?
category | contains mixture of...
* | notspam spam
notspam | 98.35% 1.65%
spam | 0.00% 100.00%
---------------
dbacl 1.5 with cef,headers,alt,links
Fri Nov 14 11:33:33 EST 2003
---------------
Where do misclassifications go?
true | but predicted as...
* | notspam spam
notspam | 100.00% 0.00%
spam | 5.80% 94.20%
What is really in each category after prediction?
category | contains mixture of...
* | notspam spam
notspam | 98.85% 1.15%
spam | 0.00% 100.00%
|