dbacl project homepage


Summary
Forums
CVS
Download

Laird Breyer

Download

previous next

Beyond Naive Bayes

The classification methodology used by the dbacl project is fully Bayesian, which is reflected in the choice of learning models and the choice of decision algorithms.

When learning a text corpus, dbacl builds a Bayesian linguistic model by maximizing the entropy (uncertainty) of selected features. dbacl can cope with features defined by arbitrary regular expressions, which gives it unified support for unigram ("naive bayes"), n-gram and various random field models. Automatic feature smoothing is achieved through the combination of maximum entropy and the digramic reference measure.

When classifying new text with the utility bayesol(1), the optimal Bayesian decision is given based on prior probabilities, the calculated conditional distribution, and supplied misclassification costs.

When calculating conditional distributions, dbacl(1) can either report probablities or universal information measures such as the cross entropy, which allows direct comparison with other algorithms, and can also be used in minimum description length calculations.

previous next