Train spamassassin using amavis under Ubuntu

Buried in SpamIf you are using amavis then you are probably not using spamd with auto-learn.    Assuming that’s the case, then you need to know a few things to teach the spamassassin component of amavis when it misses spam or misclassifies ham (good mail) as spam.

I use Ubuntu(+postfix+amavis+opendkim+postgrey) as a front end for my Microsoft Exchange Server.  So, I collect the spam it misses (and the good mail it mistakes for spam) from Outlook and feed it to spamassassin periodically so that it can learn from its mistakes.

Here’s how:

sudo -u amavis -H sa-learn --showdots {--ham|--spam} your-message-or-message-folder

The last argument is either a single message or a folder of messages in text format.   I extract these from Microsoft Outlook (you should include the full message headers using either a VBA export script or by copying them from message options in the Outlook gui).  If you are using a Linux mail reader, its much easier because you can store the spam in mbox files which can be used directly.  In this case you should use this command instead (add --mbox parameter):

sudo -u amavis -H sa-learn --showdots --mbox {--ham|--spam} your-mbox-file

Another way to extract messages from Exchange/Outlook into the form that spamassassin can consume them is to set up an IMAP mail account on your Linux server (using Dovecot). Add the IMAP mail account to your Outlook. Now you can drag and drop spam from your primary mail account to your Linux IMAP account. These messages can be easily exported into an mbox file.

In either case, select --ham if you are training spamassassin with good email or --spam if you are feeding it spam.

You need to restart amavis when you have finished a training session so that it can apply its newly learned knowledge:

service amavis restart

You can check how spamassassin will treat a message with this command:

cat message-file | sudo -u amavis -H spamassassin -t

If you do this before and after a training session you can easily see the difference in Bayesian scores and message classification.

Although using your own spam and ham is best for training, you can download old training sets from the spamassassin web site.