Analysis: MS Spam Filter In E2K3
By now all the tech magazines have reported on this in some form
or another, usually short news blurbs. BillG used his Comdex
Keynote to announce that Redmond is going to add a spam filter
to Exchange 2003 in the first half of 2004. So, what does this
actually mean, and what is it really? First in Chairman Bill's
"Now, there are several approaches being used to make sure spam
doesn't become something that holds back people in using e-mail.
One of those is the ability to detect spam mail versus legitimate
mail, and we have an approach to that we call SmartScreen technology.
It came out of our Research group when they noticed that the
frequency of words and the types of links and things on the spam
was generally quite different than normal mail. So the SmartScreen
is going to be in every mail thing we do. It's recently up in MSN
and Hotmail. It's in Outlook. It's in a release of Exchange that
we're making in the months ahead. So that's a very big step forward
No one has seen it yet, but MS will probably have a beta up a month
from now, and it's likely to be available in a service pack for
E2K3 first half of 2004. The SmartScreen filter itself is described
as a machine-learning approach developed by MS-Research. Note that
Redmond carefully steers away from the word "Bayesian". The approach
is fully mechanical though, there is no human being involved in
determining what is spam or not.
Microsoft officials, in an interview with eWEEK.com on Monday,
said that the upcoming Exchange Intelligent Message Filter (IMF)
add-on to Exchange Server 2003 that Bill Gates talked about during
his Comdex keynote isn't designed to be the "end-all, be-all"
solution for stopping spam within an enterprise's messaging
network. "We feel most companies will run it as a complementary
sort of solution," said T.A. McCann, an Exchange group product
manager. That sounds very much like the approach of many system
admins to run two or more anti-virus engines at the same time.
A very important point I'd like to make is the issue of control
and reporting of spam, which do not seem to be addressed. e-Week
The Exchange IMF will also not run on the gateway, and MS said
it will not make third party software irrelevant. Also, keep in
mind that it is a MS Version 1.0 for a while. You know what that
means. And remember there is a whole cottage industry now that
lives off fooling spam filters. Guess which ones will be torn
apart first, and I'll take a bet with you on the first date a
site appears with the title: "100 ways to get around the MS spam
Regarding current Exchange add-ons, most anti-spam products will
adapt, and will live in peaceful coexistence with the built-in
filter. Others will simply add more features that are not yet in
Exchange 2003, and/or aren't planned in the near future.
Regarding Sunbelt's iHateSpam Server, we of course expected that
MS would add some sort of filter when they announced their spam-
filter API-hooks earlier this year. There are many precedents for
this, defragmentation is a good example.
So I'm happy to announce the fact we are well on our way to come
out with an upgrade to iHateSpam Server that will include anti-virus,
content filtering, powerful disclaimers, archiving and more
goodies you asked for... done the right way. It will integrate
with and enhance Exchange 2003, and likely even improve the spam
detection rates as well. Here is an interview with yours truly
in ComputerWorld where we announce this new product that currently
has the code name: "Messaging Ninja". (Second page)
Also, keep on reading if you want to have a look at an early
version of their SmartScreen, and how it works.
Outlook 2003 Spam Filter: Under The Hood
If you want to read a technical article about the 2003 spam filter,
which likely will be similar to the IMF in 2003, here is some very
interesting nitty gritty, and how they think this filter will work
in real environments. It was written by the guys from Mapilab, and
they are real pro's in this field. They turned this product inside
out, and the article unmasks it, warts and all.
The technology behind Outlook 2003's spam filter consists of a large
dictionary that assigns weighting factors and scores to tens of
thousands of words. Next, it does around ten checks, looking at other
message characteristics, for example the time a message was sent.
This filtering process determines whether Outlook 2003 considers
the message to be spam.
MAPILab is critical of Microsoft's approach, saying it "can hardly
be called 'state-of-the-art technology.'" Maybe, maybe not. The
weighting factors of the dictionary are of course based on the
probability that a message that contains some or more of these
words will be spam. iHateSpam has similar rules. Developing such
heuristics is not necessarily simple.
MAPILab is right in giving the thumbs down on the fact Microsoft's
engine has no "training" feature. No individual user adaptation,
except for white- and blacklist. I also do not see how updates to
the Outlook engine will be done. Definitely a missing puzzle piece.
Last remark: I'm fairly sure that some hacker is going to disassemble
and unencrypt that dictionary, and then they would know how to outwit
it. Click here for the MAPILab (warning, very technical) article:
Network World has an anti-spam section where they allow you to
automatically compare a tremendous amount of products against
each other. Great to make a short list!