Spam Filters and False Positives
A few times a month, staff or members come to Lambda Chi Alpha’s IT department asking about our spam filters and why their message was blocked. Those questions aren’t easy to answer, but I did try to break the message into smaller pieces and offer them an executive summary.
EXECUTIVE SUMMARY
Within a 20 hour time frame, our last two spam-filtering layers (IMF and SF) processed 2,231 messages; 676 passed through these two filters and were delivered to staff (though some spam still slipped through), and only one person on staff had a valid message blocked.
- 69 percent of the messages passing through the last two filtering levels was flagged as spam
- 30 percent of the messages passing through the last two filtering levels was delivered to our inboxes (but still contained some spam)
- 1 message passing through the last two filtering levels was a false positive
INTRO
A few times a month, Lambda Chi Alpha’s IT Department receives feedback from its members that their email(s) was blocked and could not be delivered to the appropriate person on staff. As IT managers, we can often seem both cold and heartless in our efforts to remedy a solution. Here’s a copy of an email I recently sent to all staff to educate everyone on the complexity of having a computer decide if an email is wanted or unwanted (e.g. spam).
HOW MUCH SPAM?
While experts from security companies and internet service providers differ, most say that roughly 90 percent of all email traffic is spam. In a two week period, I receive about 550 emails. Of those, 61 messages were spam. So, with our current spam filtering balance, 12 percent of the messages that I receive are unwanted. This is much better than 90 percent, which is what I’d receive if IT turned of all of our spam filters.
FALSE POSITIVES
What I don’t know is how many wanted emails I should have received but didn’t because our filters considered the message to be spam. Wanted messages that are mistaken as spam are called False Positives. In an effort to educate ourselves on how many false positives occur in any given day, we started archiving all blocked messages instead of deleting them, which is the typical practice.
FILTERING
There are many layers (like an onion) to our spam defenses. Each layer provides some level of filtering. I took a close look at two of those layers in the past 24 hours. Here is what I found.
Microsoft’s Intelligent Message Filtering (last layer)
The MS Intelligent Messaging Filtering layer is the last layer in our defenses. This layer looks at the content of the subject and body to determine if the message is spam or not. IMF scores every message, giving a higher score to messages that contain “Viagra,” “sex,” “free,” “!!!!,” “click here now,” etc. phrases. If the score exceeds a certain threshold, the message is flagged as spam and is deleted.
For one day, instead of automatically deleting those flagged emails, I sent them to a new location, which I personally read and reviewed each message. In about 20 hours, we received 92 messages that were considered spam and only one false positive. This means that at the IMF filtering layer, only 1.08 percent of the messages received were deleted by mistake (a false positive).
Sender Filtering (a middle layer)
On another layer, the server looks at who is sending us the message. Almost once a week, I update this layer with a list of servers and domains that we don’t want to hear from (ever). The list includes hundreds of @poker.com, @sex.com, @shophere.com, @africanprince.com, domains you can think of.
I created a log file for this layer to store all of its activity. This layer filters out a lot of spam, so I didn’t keep the actual messages; I instead kept a log of just the TO, FROM, SUBJECT, etc. fields.
In one example, we received 65 unwanted emails in 52 seconds. Not a single message at 8:56 this morning was legitimate.
Since this logs are quite long — 796 messages reviewed from 8 - 9 a.m. — I didn’t take the time to review every subject line to guess if we had any false positives or not. But I did search for .edu senders: we had two and both of their messages passed on to the next spam filtering layer.
Connection Filtering (a high layer)
One of our first levels of defense is to ask third-party email-services if they consider a message to be spam. For every incoming message, we first forward information about each message to about 10 external spam-filtering services. These services maintain a real-time list of bad guys and bad servers who are spewing out spam as we speak. Here’s an example:
* SpamCop
I don’t have a way to log how much spam these third-parties block for they are all free services. We could, giving additional resources, pay for commercial connection filtering services that would give us all kinds of stats and help us identify false positives (giving us a chance to white list them).
CLOSING
I know how frustrating it can be to have your email message blocked (or to not be able to send a message to someone). Every month, hundreds of our members block messages we send to them (like the C&C monthly email). It’s not personal. It’s just very difficult for computers to determine if a message is wanted or unwanted with 100 percent accuracy.
IT plays with this balance almost every week, making minor tweaks. No mater how hard we try, we will always have some false positives (unless we disable all spam filters). When a user has their message blocked, they should try to send it again from another account, change the content of their message, or simply wait a few days for their email server to be removed from a third-party block list.
Sadly, it cost far more resources to filter out spam than it does to simply send it. I could spew out 1 million spam messages in less than an hour, but I could make a full-time career out of trying to filter it all out.
March 6th, 2007 at 9:33 pm
[...] our members and our staff about why they might get blocked has led to this posting earlier this week, along with taking the time to respond to each user’s [...]