Overview
Greylisting, outlined in Evan Harris's whitepaper, is a means of filtering spam by temporarily failing to accept a message. Most spammers don't bother to retry, and for those who do, the delay makes it more likely that anti-spam facilities like DNS blacklists will have registered the spam. There are several stand-alone greylist milters that can simply plug into most MTAs. I chose to implement greylisting in MIMEDefang because I wanted to try a couple of tweaks. I based my implementation upon the work of Jonas Eckerman, Steve Rocha and Mark Tranchant.
Scoring
I like the demerit system that Jeff Ballard and David Parter use to determine the greylisting period, so I implemented something similar. I incorporated operating system fingerprinting via p0f, along with tests of the relay address and hostname and the sender address and sender domain name. I decided not to score recipient addresses. Our server doesn't relay incoming mail to other interior MTAs, so I didn't see much use for recipient address scoring.
Score Caching
The scoring system uses a couple of tests, such as DNS lookups, that consume some time. I realized I might be able to speed things up by caching scores. (Actually, our mail server has a caching DNS server, so I'm not sure how much of a performance boost score caching provides in our case.) I created two database tables, one for relay address scores and one for sender address scores. Each table has two columns--the key, either the relay address or sender address; and an expiration time. If the scoring code finds a hit in the cache, it simply uses the cached score instead of redoing the tests. I wrote a script to periodically scan the cache tables from outside of MIMEDefang and delete expired entries.
Body Scoring
I decided to do greylisting in the filter_recipient subroutine, so that messages can be tempfailed as early as possible. Because I greylist before the message body is received, I can't use the message body to calculate the greylist score. The score cache allows me to use the message body to set scores for relays and senders. I run SpamAssassin out of MIMEDefang. I added code so that, if the message scores as spam, all greylist entries from the relay and sender are deleted and the cached relay and sender scores are set to the maximum. Deleting the greylist entries forces any message from the relay and sender to be greylisted again, and changing the cached scores means that the message will be greylisted for the maximum possible time.
I really like this tweak. I can tempfail incoming mail without scanning the message body, which takes some of the load off the mail server. Any spam that does get through triggers maximum greylisting for the offending relay-sender combination, which seems like the right thing to do.
Relay Keys
One problem with greylisting is that large organizations may send mail from multiple relay addresses, which may change over time. Each relay address must be tempfailed individually. This means multiple delays for exchanges between a particular sender and recipient. If you want to whitelist the servers from a specific organization, it also means maintaining a long list of whitelisted relay addresses.
To deal with this issue, at first I considered using CIDR blocks instead of IP addresses. But then I hit on this algorigthm to create a relay key, which works pretty well:
relay key = relay address
sender domain = domain part of sender address
IF sender domain matches rightmost substring of relay hostname
AND relay name and relay address do not seem to be dynamically assigned
THEN
relay key = sender domain
This works fine. But there are some cases where the relay hostname does not match the domain part of the sender address. For example, if the sender address is yoda@lists.foo.com and the relay hostname is vader.mtapool.foo.com, then the algorigthm fails. To work around that, I created a lookup table, keyed by sender domain, which returns a relay domain. The value for the entry for lists.foo.com would then be foo.com Using this revised algorithm, I can now translate more relay addresses into relay keys:
relay key = relay address
sender domain = domain part of sender address
relay key domain = sender domain
IF entry for sender domain in lookup table
THEN
relay key domain = value of relay key domain from lookup table
IF relay key domain matches rightmost substring of relay hostname
AND relay name and relay address do not seem to be dynamically assigned
THEN
relay key = relay key domain
This works very well.




