dbacl project homepage


Summary
Forums
CVS
Download

Laird Breyer

Download

previous next

Decision Theory

If you've read this far, then you probably intend to use dbacl to automatically classify text documents, and possibly execute certain actions depending on the outcome. The bad news is that dbacl isn't designed for this. The good news is that there is a companion program, bayesol, which is. To use it, you just need to learn some Bayesian Decision Theory.

We'll suppose that the document sample4.txt must be classified in one of the categories one, two and three. To make optimal decisions, you'll need three ingredients: a prior distribution, a set of conditional probabilities and a measure of risk. We'll get to these in turn.

The prior distribution is a set of weights, which you must choose yourself, representing your beforehand beliefs. You choose this before you even look at sample4.txt. For example, you might know from experience that category one is twice as likely as two and three. The prior distribution is a set of weights you choose to reflect your beliefs, e.g. one:2, two:1, three:1. If you have no idea what to choose, give each an equal weight (one:1, two:1, three:1).

Next, we need conditional probabilities. This is what dbacl is for. Type

% dbacl -l three sample3.txt
% dbacl -c one -c two -c three sample4.txt -N
one  0.00% two 100.00% three  0.00%

As you can see, dbacl is 100% sure that sample4.txt resembles category two. Such accurate answers are typical with the kinds of models used by dbacl. In reality, the probabilities for one and three are very, very small and the probability for two is really close, but not equal to 1. See Appendix B for a rough explanation.

We combine the prior (which represents your own beliefs and experiences) with the conditionals (which represent what dbacl thinks about sample4.txt) to obtain a set of posterior probabilities. In our example,

Posterior probability that sample4.txt resembles one: 0%*2/(2+1+1) = 0%
Posterior probability that sample4.txt resembles two: 100%*1/(2+1+1) = 100%
Posterior probability that sample4.txt resembles three: 0%*1/(2+1+1) = 0%

Okay, so here the prior doesn't have much of an effect. But it's there if you need it.

Now comes the tedious part. What you really want to do is take these posterior distributions under advisement, and make an informed decision.

To decide which category best suits your own plans, you need to work out the costs of misclassifications. Only you can decide these numbers, and there are many. But at the end, you've worked out your risk. Here's an example:

If sample4.txt is like one but it ends up marked like one, then the cost is 0
If sample4.txt is like one but it ends up marked like two, then the cost is 1
If sample4.txt is like one but it ends up marked like three, then the cost is 2
If sample4.txt is like two but it ends up marked like one, then the cost is 3
If sample4.txt is like two but it ends up marked like two, then the cost is 0
If sample4.txt is like two but it ends up marked like three, then the cost is 5
If sample4.txt is like three but it ends up marked like one, then the cost is 1
If sample4.txt is like three but it ends up marked like two, then the cost is 1
If sample4.txt is like three but it ends up marked like three, then the cost is 0

These numbers are often placed in a table called the loss matrix (this way, you can't forget a case), like so:

correct category misclassified as

one two three

one 0 1 2

two 3 0 5

three 1 1 0

We are now ready to combine all these numbers to obtain the True Bayesian Decision. For every possible category, we simply weigh the risk with the posterior probabilities of obtaining each of the possible misclassifications. Then we choose the category with least expected posterior risk.

For category one, the expected risk is 0*0% + 3*1000% + 1*0% = 3
For category two, the expected risk is 1*0% + 0*100% + 1*0% = 0 <-- smallest
For category three, the expected risk is 2*0% + 5*100% + 0*0% = 5

The lowest expected risk is for caterogy two, so that's the category we choose to represent sample4.txt. Done!

Of course, the loss matrix above doesn't really have an effect on the probability calculations, because the conditional probabilities strongly point to category two anyway. But now you understand how the calculation works. Below, we'll look at a more realistic example (but still specially chosen to illustrate some points).

One last point: you may wonder how dbacl itself decides which category to display when classifying with the -v switch. The simple answer is that dbacl always displays the category with maximal conditional probability (often called the MAP estimate). This is mathematically completely equivalent to the special case of decision theory when the prior has equal weights, and the loss matrix takes the value 1 everywhere, except on the diagonal (ie correct classifications have no cost, everything else costs 1).

previous next