« Spamming is the scourge of electronic-mail and newsgroups on the Internet. It can seriously interfere with the operation of public services, to say nothing of the effect it may have on any individual’s e-mail mail system. … Spammers are, in effect, taking resources away from users and service suppliers without compensation and without authorization. »
— Vint Cerf, Senior VP, MCI and acknowledged « Father of the Internet »
Spam. Seems like it’s become a cost of having an e-mail address these days: if you post in a newsgroup, enter something in an on-line guestbook, or have your email address on the Net in some way, sooner or later you’ll get harvested by the spambots. Even if you don’t, spam _still_ costs you money: it takes up bandwidth that could otherwise be used for real information transfer, leading to overall higher costs for ISPs – and consequently, keeping up costs of service for everyone. This cost is, incidentally, up in the tens of millions of dollars per month (see http://www.techweb.com/se/directlink.cgi?INW19980504S0003) for an excellent overview) – and this translates directly to about $2 of your monthly bill. If you pay for your access « by the byte », there is yet another cost – and all this comes before you add in the cost of your own wasted time.
Is there anything that we can do? The answer is « yes ». We can stop spam from polluting our own mailboxes, and we can intercept it back at the ISP, if we have access to a shell account and they implement a simple tool (and most ISPs that provide shell accounts do). I invite those of you who would like to fight spam at its root to take a look at http://www.cauce.org – these are the folks that are advocating a legislative solution to spam; the information on their site tells you how you can help. In this article, however, I will concentrate on stopping spam locally – at your shell account or on your own machine.
There are several ways to do this, but the most common by far – and one that most ISPs offering shell-accounts already have – is a program called « procmail » by Stephen R. van den Berg, an e-mail processor that uses a ‘recipe’ that tells it what to keep, what to filter, and what to redirect to another mailbox. So, we need to do two things: first, we need to tell our system to use « procmail »; second, we need to cobble together a ‘recipe’ that will do what we want.
In my own case, I collect my e-mail via « fetchmail », running as a daemon. This is something I would recommend to everyone, even if you normally collect your mail via Netscape: fetchmail does one job (mail collection) and does it very well, in the worst and most complex of circumstances, things that Netscape doesn’t even try to do (multiple servers with different protocols and different usernames, for example) – and Netscape will happily read your local mailbox instead of the ISPs.
Normally, my « fetchmail » will wake up every 5 minutes, pull down the mail from the several servers that I use, and pass it to « sendmail » which then puts it in my mailbox. Whew. Sounds like wasted effort to me, but I guess that’s the way things are when you scale down an MTA intended for processing big batches… Actually, using « procmail » eliminates that last step.
In my « ~/.fetchmailrc », the resource file that controls what « fetchmail » does when it runs, the pertinent line reads:
mda « procmail »
This tells « fetchmail » to use « procmail » as the mail delivery agent instead of « sendmail » – remember, this is for incoming mail only; your outgoing mail will not be affected.
The other way to do this – and this is the way I recommend if you’re filtering mail at your ISP’s machine – is to create a « .forward » file in your home directory (this tells your MTA to ‘forward’ the mail – in this case to our processor.)
Edit « .forward » and enter one of the following lines:
« |exec /usr/bin/procmail »
if you’re using « sendmail » (the quotes are necessary in this case).
If you are using « exim », use this instead:
[ Note: According to Mike Orr, « exim » has its own procmail-like filtering language. I haven’t looked at it, but it should be in the « exim » docs. ]
You’ll need to double-check the actual path to « procmail »: you can get that by typing:
at the command prompt.
Now that we have redirected all our mail to pass through procmail, the overall effect is… nothing. Huh? Oh yeah – we still have to set up the recipe! Let’s take a look at a very simple « .procmailrc », the file in which the recipes are kept:
MAILDIR=/var/spool/mail # make sure this is right
DEFAULT=$MAILDIR/username # completely optional
LOGFILE=/var/log/procmail.log # recommended
Those top four lines, once you’ve checked to make sure that the variables are correct for your system, should be in every « .procmailrc ». What comes after can be as complex as you want – you could cobble up a HUGE « .procmailrc » that does more sorting than the main US Post Office – but for spam filtering purposes (and that’s the only thing most folks use it for), it’s not very complex at all. The above recipe simply sorts the mail into two boxes, « linux-kernel-announce » and « debian-user » before « falling
off the end » and delivering everything else into $DEFAULT.
Recipes are built like this:
:0 Begin a recipe
: Use a lock file (strongly recommended)
* Begin a condition
^ Match the beginning of a line followed by….
Subject: « Subject: » followed by….
. any character (.) followed by….
* 0 or more of preceding character (any character in
this case) followed by….
test « test »
joe If successful match, put in folder $MAILDIR/joe
What we’ll do here is take a look at several people’s solutions; in order to write this article, I polled the members of the Answer Gang, and some of their recipes – along with their rationale for them – are shown below.
My own recipe has been in service for quite a while. I built a rather basic one at first, and this immediately decreased the spam volume by at least 95%; later, I added a « blacklist » and a « whitelist » to always reject/accept mail from certain addresses – the first is useful for spammers that manage to get through, especially those that send their garbage multiple times, the second one is for my friends whose mail I don’t want to filter out no matter what strange things they may put in the headers (I have some strange friends. 🙂
For those of you who use « mutt », here’s how I add people to those lists: in my « /etc/Muttrc », I have these lines:
macro index \ew ‘| formail -x From: | addysort >> ~/Mail/white.lst’
macro pager \ew ‘| formail -x From: | addysort >> ~/Mail/white.lst’
macro index \eb ‘| formail -x From: | addysort >> ~/Mail/black.lst’
macro pager \eb ‘| formail -x From: | addysort >> ~/Mail/black.lst’
and in my « /usr/local/bin », I have a script called « addysort »:
# Picks out the actual address from the « From: » line