SourceForge.net Logo
Summary
Forums
CVS
Download

Laird Breyer
Download
contents introduction tutorial spam fun man related
previous next

Miscellaneous

Be careful when classifying very small strings. Except for the multinomial models (which includes the default model), the dbacl calculations are optimized for large strings with more than 20 or 30 features. For small text lines, the complex models give only approximate scores. In those cases, stick with unigram models, which are always exact.

In the UNIX philosophy, programs are small and do one thing well. Following this philosophy, dbacl essentially only reads plain text documents. If you have non-textual documents (word, html, postscript) which you want to learn from, you will need to use specialized tools to first convert these into plain text. There are many free tools available for this.

dbacl has limited support for reading mbox files (UNIX email) and can filter out html tags in a quick and dirty way, however this is only intended as a convenience, and should not be relied upon to be fully accurate.

previous next
contents introduction tutorial spam fun man related