Spam Prevention, or, the sorry state of Email

Email spam is universally loathed. It’s difficult to prevent entirely, not only because spammers have a wealth of techniques at their disposal, but because so many legitimate mailers are misconfigured or routinely behave like spammers. The best approaches to combating spam involve multiple techniques to combat various spam techniques. I’ll outline what works, and what doesn’t, and hopefully provide some insight into how spammers work, and some of the more sleazy techniques I’ve encountered.

There’s a spectrum of spam, from the terrifically illegitimate to the “legitimate,” where a semi-reputable company adds you to their mailing list because of something you ordered (perhaps you left a default box checked that said “I want to receive marketing material via email.”) On the illegitimate side are usually commercial operations dedicated to spam, often using zombie farms of compromised machines to send out vast volumes. They often use sophisticated techniques to avoid content filters (like sending vast amounts of legitimate-sounding gibberish.) Eliminating the maximum amount of spam requires a multiple layered approach. I’ve outlined this for mail administrators:

Layer 1: Blacklists and Server Verification
Spam blacklists are simply wonderful at eliminating most spam from bot-farms and sleazy operators. Blacklists are DNS lookups where you can verify an IP address is not on it before you accept email from that IP address. False positives are nearly zero for the good lists, though every now and then somebody like AOL makes it onto a blacklist, but at the best blacklists, this doesn’t happen. A review of our mail servers’ statistics show that is solely responsible for rejecting over 95% of attempts to spam our server. This capability is provided by milter-dnsbl.

Server verification covers the other 5%. In a nutshell, this verifies that the IP address that a system provides during the MTA phase of negotiations is legitimate. Over time, we’ve encountered a few mailers that, for whatever reason, have run afoul of this filter, either from misconfiguration, or from perversely sending email from an unresolvable address. I can’t think of a legitimate reason why anybody would feel the need to use unresolvable addresses to send mail; in cases where I’ve pursued this, it’s generally been the fault of a bumbling administrator or IT department. Every time I’m tempted to relax this requirement, I look at the volumes of spam eliminated and think, hey, if you can’t configure your own mailer properly, maybe nobody should accept mail from you. This capability is provided by spamilter.

Layer 2: Greylists
After making it past the blacklist, the next thing encountered by a would-be mailer is the greylist. To put it succinctly, a greylist is a way of telling certain mailers, “try again later.” Legitimate mailers will do exactly that, while a lot of spam farms give up confusedly. For others, it gives them enough time to be placed on a blacklist next time they make the attempt. A greylist works by tracking the IP (and often, origin email) of the mailer that is contacting you. Next time that same mailer contacts you, if enough time has expired, it’s allowed through.

The tricky part about greylists is coping with the behavior of some mailers, particularly big ones. Those that adhere to SPF are easy, most greylists will happily let SPF-compliant mailers right through. For the rest, most greylist implementations have a “whitelist” of mailers that respond poorly to the technique, either by sending from a different IP address every time (and therefore never satisfying the waiting period) or known issues where mailers may get confused or not retry for a very long time.

Another side effect is that legitimate mail can, and will, be delayed. A particularly effective technique is to greylist all email from origins not within your country — in my case, skipping the greylist for US-origin addresses interferes with as little mail as possible — and most of the spam comes from non-US computers. This capability is provided by milter-greylist.

Layer 3: Content Filtering
Hopefully, most spam is eliminated before we get this far, because no matter how sophisticated content filtering gets, it can be problematic to consistently separate spam from (for example) messages from a family member who spells poorly and has questions for you about Viagara.

So the first thing to go is make another run through the blacklists. While this may seem redundant, the reason for this is that it will pick up blacklisted IP addresses that are relaying through somewhere else. A common spam technique is to create an email forwarding address for you on a service like bigfoot (I see a lot of these) and then spam that address, which merrily forwards all the spam to you, thus effectively skipping the blacklist — unless you scan through all the headers, too. This capability is provided by spamilter.

The next thing to do is eliminate the obvious — mangled email. While spammers make an effort to make their mail look legitimate, invalid or multiple headers can result from spam being relayed through security holes in web sites. Spammers generally can’t see the results, nor do they care. In a related way, it’s a common technique for spammers to add multiple headers of the same type, violating most specifications but often bypassing content filters that expect mail to be in mail format, or by pumping through headers designed to exploit loopholes in clients or to overload mail servers. Since people using legitimate mail clients aren’t capable of producing broken mail, getting rid of broken mail causes no harm. This capability provided by mimedefang.

The next capability is filtering the content itself using a number of heuristic techniques that have been tuned over time, using capabilities provided by Spamassassin. Spamassassin does quite a good job, although sophisticated spammers will regularly test their spam content against its rules. Therefore, a good practice is to update its rules regularly using sa-update.

It’s also worth eliminating virus spam at this point. clamav provides this capability handily. As with spamassassin, it’s most effective when updated regularly.

Level 4: Sieve Rules
At this point, there is still a potential for false positives, and some things are going to slip through. Therefore, content filters normally just flag email. Sieve rules are a hierarchy of rules that determine how to treat email. So legitimate mail can be saved from the junk filter, and persistent spammers can be shuttled over to the Junk folder. These are normally in the hands of end users, but general rules can be effective site-wide.

Level 5: Don’t REPLY
This is true on a number of levels, the first being that a mailer should summarily reject all mail that’s not to legitimate users, rather than accept it, and attempt to bounce it back. There’s a whole class of spam known as “bounce spam,” where the “reply to” address is the actual victim, and the spammer sends email to a legitimate mailer and an invalid address. The mailer happily forwards it “back,” which actually sends the spam to the victim. There’s no benefit to ever automatically emailing the reply to address from the mailer level, either to inform an end user that they’ve typoed an email address (rejection serves that purpose adequately) or to inform an end user that they’ve sent a virus — the reply address is almost never the originator.

This also extends to the end user. For legitimate businesses, replying is usually an effective way to be removed from their mailing list — if you recognize the domain and have done business with them, there’s little risk. More sleazy operators, however, take the opportunity to add your legitimate email address to hundreds of other lists, even while nominally removing you from the list you’re presumably unsubscribing from. Your legitimate email address can now be sold to other spammers.

In a similar vein, it’s often a bad idea to click where it says “click here to be removed” for the same reasons. A particularly sleazy form of this actually takes you to a page covered with ads, and the unsubscribe box (filled in) in the middle. The spammer has now made money, because you’re a unique visitor to whom those ads have been displayed — even more if one catches your eye and you click on it.

Level 6: Report Spam
Reporting spam has a number of benefits, the biggest one is the overall reduction in spam. Spamcop is probably the best way to report spam — it sends email directly to the administrators of the systems, which are either misconfigured (open proxies or relays) or a customer of theirs is the spammer. Spamcop does an excellent job of analyzing email headers and finding out who’s really responsible. Note that spammers will often include legitimate URL’s in their spam, so it’s best to pay close attention to who the reports are being sent to any why.


Misdirected email and email disclaimers

Like many people who have been active on the Internet since AOL was a standalone service, I’ve accumulated a number of email addresses over the years, many of which I still use. Some are short and easy to remember, and at least a few of them are routinely given out by people who think they are their own.

The worst offender was a ski resort, who kept giving out my email address as their own — perhaps they even used it as their “reply to” address, since people were particularly stubborn in their insistence that they had the right address. I had a lot of conversations like these:

“I’m sorry, I’m not affiliated with any ski resort, you’ll have to phone or mail the resort to get the correct address.”

“But this is the address they gave me. Do you have parking for an RV?”

“Well, on the street, but I’m not sure what good this will do you, since I’m probably a few hundred miles away from where you want to be. As I mentioned, I have nothing to do with the resort, and I do not know how to get in touch with them.”

“Oh good. How far is the street from the slopes?”

Perhaps they just appealed to a particularly obtuse clientele, but they kept doing it. So I asked somebody who emailed me for the number of the resort, and I called them to let them know their mistake. “No, that’s our email address,” I was told. I couldn’t convince them otherwise. Eventually I resorted to just giving out reservation confirmations, and they finally stopped.

“Is it too late to reserve rooms for eight people for this weekend?”

“No, you’re all set. Your confirmation number is 6893-261#-3472@.9653!7160321796. Please have this ready when you arrive.”

I guess having irate people show up is a lot more effective than politely asking them to knock it off. A lot of people give one of my email addresses out as their own when asked for an email address. I’m not sure if they just don’t know their own, or they just don’t think it matters, but I’ve been signed up by proxy for an appalling amount of things:

  • Bank accounts (complete with “here’s your password to bank online”)
  • Home loans (complete with “update your payment address”)
  • Retail sites of all kinds, a handful with active “buy it now” credit cards
  • Medical records
  • Insurance records
  • Porn memberships (with recurring payments and a changeable password)
  • Job sites (complete with “update your resume/profile”)
  • Social networking sites (as above)
  • Dating sites (even more fun, as above)

As the mood takes me, I might locate the phone number of the person whose account it is, and notify them of their mistake (reactions have ranged from confusion to threatening to sue me.) Sometimes I’ll just change the password and forget about it (there are probably a few poor schmucks still paying for porn that they don’t have access to and can’t cancel.) Sometimes I’ll update their profile in amusing ways. Although the thought has occurred to me to drain a few bank accounts, these are people who strike me as most genuinely confused and in need of an explanation — and I’m not really that much of a bastard.

I also get signed up for a lot of mailing lists, which can be fairly obnoxious. If mailing lists have a simple way to unsubscribe, I will. Better yet, mailing lists that ask for confirmation. I don’t confirm, and that’s the end of it. Some mailing lists are particularly obnoxious — no way to unsubscribe, or even worse, the only way to unsubscribe is to enter a lot of personal information on a separate web site (which, if it doesn’t match whatever information the idiot gave them when they provided your email address, won’t let you unsubscribe) or points to a site that doesn’t exist or resolve, etc. Since I don’t want to be on the mailing list, I’ll complain directly to their ISP. I’ve had a few car dealerships disconnected from the Internet by their ISP’s — who are usually pretty cooperative.

Note to email list administrators: always confirm email address, and have a simple way to unsubscribe, or you’re a spammer.

I also get emails directly from misguided individuals. It’s remarkable the amount of personal detail that people will include to an email address they’ve never sent anything to before. I usually reply to let them know I’m not who they think they’re contacting. Occasionally, they argue (which is bizarre to me, but some people get ideas stuck in their heads. “Dot! Stop fooling around!”) and occasionally, they’re just weird — some ask for unrelated computer help (which I provide, to the extent that I can help via email) and one lady told me that she was a “married Christian woman” and that it was improper for her to talk to a strange man. (This, of course, implies to me that she desperately wants to, and either is unhappy with her husband or her repressive brand of Christianity — and she actually does keep writing — go figure.)

High on the obnoxiousness scale are the business emails I get, usually with tons of insider information, and a standard disclaimer telling me what I can and can’t do, my duties if I’m not the intended recipient, etc. I’m not a lawyer, and this isn’t legal advice by any means, but I don’t think I’m bound by any of this crap. If you send me an email, it’s mine. I’ll do what I want with it. If you’re incompetent enough to send me insider or confidential information from your company, I’m going to feel free to post it on the Internet if I damned well feel like it, and you can stick your disclaimer wherever you like.

We don’t have a contractual relationship, and your email was unsolicited. You can’t create one using your disclaimer; I don’t agree to your terms. Any of your terms. If I feel like sending you back an email informing you of your mistake, I might do that. Doing so does not mean I agree to your disclaimers, nor does it obligate me to send you another email informing you of your future mistakes when you do it again and again.

If we were to have a contractual relationship, I could see the value of a disclaimer, to, say, remind me of a confidentiality contract we mutually signed. But unsolicited email is precisely that; just as you can’t send me junk in the mail and obligate me to do anything with it, you can’t via email, either.


Outlook, Mail Archives, and Duplicates

Exchange and Outlook are dismal examples of code, but the fact remains that they are ubiquitous. Nobody has managed to create a mail/calendar/contacts/task application with wider adoption, and it has enough inertia that well designed applications have little chance to make inroads, which means a lot of people are stuck with it. For those of us who prefer elegant, well designed applications, putting up with their quirks is maddening.

Outlook, for example, has a hard-to-explain 2 gigabyte limit on mail archives — and mail archives are arguably one of the niftier features that Outlook offers. Early versions of Outlook don’t know any better, and simply corrupt your mail archives. Later versions of Outlook know better, and warn you not to exceed the limit. While some noise has been made about Outlook finally removing the 2 gigabye limit, it’s actually not quite true, it’s only been removed for Exchange style mailboxes, and is still there, for example, for imap mail boxes.

For those of us with lots of mail and the need to archive it (I receive a lot of technical documents, some very large, via email) using Outlook’s built-in “archives” isn’t really an option, so I used the simple expedient of setting up an archive IMAP server, where the size wouldn’t be an issue. While this works reasonably well going forward, Outlook puked enough while trying to move messages from its proprietary formats to imap, that I was left with a vast number of duplicates.

On a significantly large mailbox, this is a bigger problem than it sounds like — especially since the duplicates were created with different mail id’s, and in many cases the white space or envelopes are different, while the messages are clearly identical. Maddening, but it largely means that any automated duplicate removal will have to happen through IMAP, not through the filesystem.

While it seems that a tool to locate and eliminate duplicate IMAP emails would be simple to find, it appears that such a beast simply does not exist, except for the trivial case in which the message id’s are identical. At the imap level, there are a decent number of tools here:

Which work admirably, for the most part. For the remainder, I used this Thunderbird Add-on, which took care of the remaining fringe cases. The only problem, of course, is that on a really large email folder, Thunderbird starts to complain endlessly about script timeouts. However, you shouldn’t really need to do this regularly.


Mail, DNS entries, and domains

I recently overhauled bits of the mail system here to take care of a few lingering quirks that I’d never had the time nor inclination to track down. All of my various email addresses and aliases go to the exact same mailbox, through the multiple expedients of fetchmail, which picks up my mail from gmail and AOL, and DNS MX records that point everything to the same place.

Until recently, if you sent mail to “” it would be transformed by the server into “” unceremoniously. It would show up that way in the mailbox, and only by delving into the mail headers was it obvious that the mail was originally destined for a different domain. For addresses I didn’t make use of much, this was fine, though it leads to the curious circumstance where somebody sending mail to would get replies from, which deviates from the principles of separating domains in the first place.

It turns out the root cause was that, rather than having its own A record in the DNS tables, used a CNAME to Apparently this implies that mail sent to is actually for I imagine this would be particularly useful for adjunct or typo domains, where you want to correct the original destination or transition from one domain to another. It’s also useful in that the mailer only needs to internally relay for, and listen to, mail destined for; any mail sent to a CNAME from another domain pointing to it works perfectly well.

Moving the domain from a CNAME to an A record effectively separates things out again, though now the mailer must also be aware that it’s listening for mail for yet another domain.