Combating spam Edit

Created 7/20/2006, updated 6 weeks ago

Server-level blocking

Server-side spam-filtering with SpamAssassin; more specifically:
Blocking non-local senders for local-only recipients
Virus filtering (see ClamAV set-up)
DNS-based blacklisting

Client-side blocking

General spam-reduction techniques

Efficacy

Efficacy of server-side spam blocking

The following results show the number of messages blocked (rejected or discarded) at the wincent.dev mail server level in a typical 24 hour period due to the anti-spam technologies and techniques linked to from this page. They are not necessarily statistically valid samples, based on a very small data set collected during June and July of 2006, but they do provide some indication of the amount of spam which is being beaten at the server and never makes it through to the client.

Automatically deleting high-scoring spam: 47 messages
Address migration: 142 messages
Blocking non-local senders for local-only recipients: 178 messages
DNS-based blacklisting: 261 messages
Virus filtering: 47 messsages (see ClamAV log sample)

It is difficult to estimate the number of spam messages that is collectively being blocked by these countermeasures because they operate at different levels and were introduced incrementally. For example, DNS-based blacklisting was the last countermeasure to be introduced in the above list, and it blocks spam before it gets to any of the other countermeasures. As such, the number of messages stopped per day for the other countermeasures is reduced but that does not mean that their efficacy has diminished.

The order in which the techniques were deployed to the server is:

But the order in which the techniques are actually applied to incoming messages is:

DNS-based blacklisting (connections rejected with 554 reply code)
Blocking non-local senders for local-only recipients (delivery aborted with 5.7.1/551 reply code)
Address migration ("User unknown" returned to spammer)
Virus filtering (message immediately deleted before arriving at mailbox)
Automatically deleting high-scoring spam (message immediately deleted before arriving at mailbox)

So as messages pass through the various lines of defense, the number of spam messages getting through to subsequent levels is successively reduced. DNS-based blacklisting is the numerically most important countermeasure, stopping the largest number of messages, and this means that subsequent levels have to defend against fewer spam messages.

Efficacy of client-side filtering

The following results are from a test period of 24 hours in mid-July 2006:

62 spam messages made it through to my email client (two accounts: one business and one personal).
An additional 102 messages were received by the client during the same period, bringing the total count to 164 messages (spam and ham) in all.
38% of all email received by the client was spam
Of the spam messages 25 were correctly tagged as spam by SpamAssassin.
The remaining 37 were not tagged by SpamAssassin (false negatives).
SpamAssassin did not produce any false positives.
SpamAsssassin’s accuracy at the client level during the test period was a disappointing 77.4%, but this number does not reflect the high-scoring spam that was automatically deleted by SpamAssassin at the server level.
SpamSieve correctly identified all 62 spam messages as spam, with no false positives and no false negatives.
SpamSieve’s accuracy during the test period was a perfect 100%.

These results show that the new server-level measures have had a significant impact on the number of spam messages reaching the client. Prior to implementing the measures the average spam count (the average of spam messages received at the client level per day) was steadily climbing and had reached over 170 spam messages per day. The oldest stats that I have available are for a one year period between November 2004 and November 2005 in which I had received an average of about 40 spam messages per day and 11% of all email I received was spam. Today (mid-July 2006) the cumulative average is 75 spam messages per day and 22% of all email I’ve received has been spam. It seems likely that this proportion can only increase from here given that on the day of the test 38% of all mail received by the client was spam, approaching the average of 50% claimed by Symantec.

So these statistics show that the volume of spam attacks has spectacularly increased but that the new server-level measures have been effective in curbing their impact.

Areas for improvement

The SpamAssassin accuracy figures are disappointing despite using Bayesian training followed by weeks of corrective training. I will continue the corrective training in the hope that accuracy improves. If the accuracy improves sufficiently then I may be able to consider lowering the threshold for automatically deleting high-scoring spam.

Of the 62 spam messages received at the client, 19 were directed to postmaster addresses and these unfortunately cannot be blocked due to RFC 2821.

8 were sent to my personal account, 32 were sent to my business account, and the remaining had headers that were so munged that it was not easy to discern the original recipient address.

In the future I may consider undertaking another address migration, at least for my business account, as well as adding additional DNS-based blacklists. Another option is grey-listing, but I am hesitant to use it given that in many cases I rely on the near-instantaneous nature of email.

Combating spamEdit