Language models, classification and dbacl
Laird A. Breyer
Introduction
This is a non-mathematical tutorial on how to use the dbacl Bayesian text
classifier. The mathematical details can be read here.
This tutorial was revised for dbacl 1.11. As dbacl evolves, some statements
below may become inaccurate, but reasonable effort is made to keep the tutorial
synchronized.
dbacl is a UNIX command
line tool, so you will need to work at the shell prompt (here written
%, even though we use bash semantics). The program comes with five
sample text documents and a few scripts. Look for them in the same
directory as this tutorial, or you can use any other plain text
documents instead. Make sure the sample documents you will use are in
the current working directory. You need all *.txt, *.pl
and *.risk files.
The tutorial below is geared towards generic text classification. If you
intend to use dbacl for email classification, please read this.
|