Decision Theory
If you've read this far, then you probably intend to use dbacl to
automatically classify text documents, and possibly execute
certain actions depending on the outcome. The bad news is that dbacl isn't designed for this. The good news is that there is a companion program, bayesol,
which is. To use it, you just need to learn some Bayesian Decision Theory.
We'll suppose that the document sample4.txt must be classified in one of the
categories one, two and three.
To make optimal decisions, you'll need three ingredients: a prior distribution,
a set of conditional probabilities and a measure of risk. We'll get to these
in turn.
The prior distribution is a set of weights, which you must choose yourself,
representing your beforehand beliefs. You choose this before you even look at
sample4.txt. For example, you might know from experience that category one is twice as
likely as two and three. The prior distribution is a set of weights you choose
to reflect your beliefs, e.g. one:2, two:1, three:1. If you have no idea what to
choose, give each an equal weight (one:1, two:1, three:1).
Next, we need conditional probabilities. This is what dbacl is for. Type
% dbacl -l three sample3.txt
% dbacl -c one -c two -c three sample4.txt -N
one 0.00% two 100.00% three 0.00%
As you can see, dbacl is 100% sure that sample4.txt resembles category two.
Such accurate answers are typical with the kinds of models used by dbacl.
In reality, the probabilities for one and three are very, very small and
the probability for two is really close, but not equal to 1.
See Appendix B for a rough explanation.
We combine the prior (which represents your own beliefs and experiences) with
the conditionals (which represent what dbacl thinks about sample4.txt) to obtain
a set of posterior probabilities. In our example,
- Posterior probability that sample4.txt resembles one: 0%*2/(2+1+1) = 0%
- Posterior probability that sample4.txt resembles two: 100%*1/(2+1+1) = 100%
- Posterior probability that sample4.txt resembles three: 0%*1/(2+1+1) = 0%
Okay, so here the prior doesn't have much of an effect. But it's
there if you need it.
Now comes the tedious part.
What you really want to do
is take these posterior distributions under advisement, and make
an informed decision.
To decide which category best suits your own plans, you need to work
out the costs of misclassifications. Only you can decide these numbers, and there
are many. But at the end, you've worked out your risk. Here's an example:
- If sample4.txt is like one but it ends up marked like one, then the cost is 0
- If sample4.txt is like one but it ends up marked like two, then the cost is 1
- If sample4.txt is like one but it ends up marked like three, then the cost is 2
- If sample4.txt is like two but it ends up marked like one, then the cost is 3
- If sample4.txt is like two but it ends up marked like two, then the cost is 0
- If sample4.txt is like two but it ends up marked like three, then the cost is 5
- If sample4.txt is like three but it ends up marked like one, then the cost is 1
- If sample4.txt is like three but it ends up marked like two, then the cost is 1
- If sample4.txt is like three but it ends up marked like three, then the cost is 0
These numbers are often placed in a table called the loss matrix (this
way, you can't forget a case), like so:
correct category |
misclassified as |
one |
two |
three |
one |
0 |
1 |
2 |
two |
3 |
0 |
5 |
three |
1 |
1 |
0 |
We are now ready to combine all these numbers to obtain the True Bayesian Decision.
For every possible category, we simply weigh the risk with the posterior
probabilities of obtaining each of the possible misclassifications. Then we choose the category with least expected posterior risk.
- For category one, the expected risk is 0*0% + 3*1000% + 1*0% = 3
- For category two, the expected risk is 1*0% + 0*100% + 1*0% = 0 <-- smallest
- For category three, the expected risk is 2*0% + 5*100% + 0*0% = 5
The lowest expected risk is for caterogy two, so that's the category we choose
to represent sample4.txt. Done!
Of course, the loss matrix above doesn't really have an effect on the
probability calculations, because the conditional probabilities strongly point to
category two anyway. But now you understand how the calculation works. Below, we'll look at a more realistic example (but still specially chosen to illustrate some points).
One last point: you may wonder how dbacl itself decides which category to
display when classifying with the -v switch. The simple answer is that dbacl always
displays the category with maximal conditional probability (often called the MAP estimate). This is mathematically completely equivalent to the special case of decision theory when the prior has equal weights, and the loss matrix takes the value 1 everywhere, except on the diagonal (ie correct classifications have no cost, everything else costs 1).
|
|