Summary
Forums
CVS
Download

Laird Breyer

Download

introduction

previous next

dbacl - a digramic Bayesian classifier

Introduction

The dbacl project consist of a set of lightweight UNIX/POSIX utilities which can be used, either directly or in shell scripts, to classify text documents automatically, according to Bayesian statistical principles. dbacl(1) is also the name of the core utility.

Automatic text classification can be used for a variety of tasks, including: junk email filtering, web page screening, simple automated answering machines, email prioritization and advanced sorting, topical news gathering, and even playing chess. The dbacl utilities cannot do all these tasks directly, but by concentrating on simple, powerful, and easily integrated tools these applications become possible.

Bayesian statistics is a formal way of combining prior beliefs with observed facts into posterior beliefs, and ultimately, informed decisions.

The dbacl project uses this method to calculate posterior probabilities that a given text resembles one of any number of previously learned document collections, and makes mathematically optimal decisions based on arbitrarily chosen misclassification costs.

The dbacl utilities are Free Software distributed under the terms of the GNU General Public License (GPL).

previous next