dbacl - a digramic Bayesian classifier
Introduction
The dbacl project consist of a set of lightweight UNIX/POSIX
utilities which can be used, either directly or in shell scripts,
to classify text documents automatically, according to Bayesian statistical
principles. dbacl(1) is also the name of the core utility.
Automatic text classification can be used for a variety of tasks, including:
junk email filtering, web page screening,
simple automated answering machines,
email prioritization and advanced sorting, topical news gathering,
and even playing chess.
The dbacl utilities cannot
do all these tasks directly, but by concentrating on simple, powerful, and
easily integrated tools these applications become possible.
Bayesian statistics is a formal way of combining prior beliefs with observed
facts into posterior beliefs, and ultimately, informed decisions.
The dbacl project uses this method to calculate posterior probabilities that
a given text resembles one of any number of previously learned document
collections, and makes mathematically optimal decisions based on arbitrarily
chosen misclassification costs.
The dbacl utilities are Free Software distributed under the terms of the
GNU General Public License (GPL).
|