by Jim Dennis, Proprietor, Starshine Technical Services
Converted to HTML by Heather Stern
procmail is the mail processing utility language written by Stephen van den Berg of Germany. This article provides a bit of background for the intermediate Unix user on how to use procmail.
As a « little » language (to use the academic term) procmail lacks many of the features and constructs of traditional, general-purpose languages. It has no « while » or « for » loops. However it « knows » a lot about Unix mail delivery conventions and file/directory permissions — and in particular about file locking.
Although it is possible to write a custom mail filtering script in any programming language using the facilities installed on most Unix systems — we’ll show that procmail is the tool of choice among sysadmins and advanced Unix users.
Unix mail systems consist of MTA’s (mail transport agents like sendmail, smail, qmail mmdf etc), MDA’s (delivery agents like sendmail, deliver, and procmail), and MUA’s (user agents like elm, pine, /bin/mail, mh, Eudora, and Pegasus).
On most Unix systems on the Internet sendmail is used as an integrated transport and delivery agent. sendmail and compatible MTA’s have the ability to dispatch mail *through* a custom filter or program through either of two mechanisms: aliases and .forwards.
The aliases mechanism uses a single file (usually /etc/aliases or /usr/lib/aliases) to redirect mail. This file is owned and maintained by the system administrator. Therefore you (as a user) can’t modify it.
The « .forward » mechanism is decentralized. Each user on a system can create a file in their home directory named .forward and consisting of an address, a filename, or a program (filter). Usually the file *must* be owned by the user or root and *must not* be « writeable » by other users (good versions of sendmail check these factors for security reasons).
It’s also possible, with some versions of sendmail, for you to specify multiple addresses, programs, or files, separated with commas. However we’ll skip the details of that.
You could forward your mail through any arbitrary program with a .forward that consisted of a line like:
"|$HOME/bin/your.program -and some arguments"
Note the quotes and the « pipe » character. They are required.
« Your.program » could be a Bourne shell script, an awk or perl script, a compiled C program or any other sort of filter you wanted to write.
However « your.program » would have to be written to handle a plethora of details about how sendmail would pass the messages (headers and body) to it, how you would return values to sendmail, how you’d handle file locking (in case mail came in while « your.program » was still processing one, etc).
That’s what procmail gives us.
What I’ve discussed so far is the general information that applies to all sendmail compatible MTA/MDA’s.
So, to ensure that mail is passed to procmail for processing the first step is to create the .forward file. (This is safe to do before you do any configuration of procmail itself — assuming that the package’s binaries are installed). Here’s the canonical example, pasted from the procmail man pages:
"|IFS=' '&&exec /usr/local/bin/procmail -f-||exit 75 #YOUR_USERNAME"
This seems awfully complicated compared to my earlier example. That’s because my example was flawed for simplicity’s sake.
What this mess means to sendmail (paraphrasing into English) is:
- Pipe the mail to the following command(s):
- Set the shell’s « inter-field seperator » (IFS) to a space, and — if that went O.K. (&&) execute the program named « /usr/local/bin/procmail »
(yours may need to be different — try the command ‘which procmail’ to see if it’s on the path or ‘locate procmail’ if your system maintains a file locator database).
- The procmail program is being passed a set of switches: « -f- » which tells it to « update timestamp in the leading the ‘From’ line in the header »
(this last bit is rather obscure and has to do with how messages are normally stored in your « incoming » or mail file or « spool » as we Unix hacks like to call it).
- The next part of this .forward command is the Bourne shell’s « || » operator which is basically a continuation from the « and » (&&) operator that we used before. It says « or » (if that command didn’t work — i.e. it returned any error) then « exit » (stop processing) and return an error number 75 (which we presume is meaningful to sendmail — the program that called this command).
- The last part of this .forward expression is a comment which according to the man pages:
« is not actually a parameter that is required by procmail, in fact, it will be discarded by sh before procmail ever sees it; it is however a necessary kludge against overoptimising sendmail programs:«
- You should just change the phrase YOUR_NAME to your login name on that system.
This complicated line can be just pasted into most .forward files, minimally edited and forgotten.
If you did this and nothing else your mail would basically be unaffected. procmail would just look for its default recipe file (.procmailrc) and finding none — it would perform its default action on each messages. In other words it would append new messages into your normal spool file.
If your ISP uses procmail as its local delivery agent then you can skip the whole part of about using the .forward file — or you can use it anyway.
In either event the next step to automating your mail handling is to create a .procmailrc file in your home directory. You could actually call this file anything you wanted — but then you’d have to slip the name explicitly into the .forward file (right before the « || » operator). Almost everyone just uses the default.
Now we can get to a specific example. So far all we’ve talked about it how everything gets routed to procmail — which mostly involves sendmail and the Bourne shell’s syntax. Almost all sendmail’s are configured to use /bin/sh (the Bourne shell) to interpret alias and .forward « pipes. »
So, here’s a very simple .procmailrc file:
This just appends an extra copy of all incoming mail to a file named « mail.backup » in your home directory.
Note that a bunch of environment variables are preset for you. It’s been suggested that you should explicity set SHELL=/bin/sh (or the closest derivative to Bourne Shell available on your system). I’ve never had to worry about that since the shells I use on most systems are already Bourne compatible.
However, csh and other shell users should take note that all of the procmail recipe examples that I’ve ever seen use Bourne syntax.
The :0 line marks the beginning of a « recipe » (procedure, clause, whatever. :0 can be followed be any of a number of « flags. » There is a literally dizzying number of ways to combine these flags. The one flag we’re using in this example is ‘c’ for « copy. »
You might ask why the recipe starts with a :0. Historically you used to use 😡 (where x was a number). This was a hint to procmail that the next x lines were conditions for this recipe. Later, the option was added to precede conditions with a leading asterisk — so they didn’t have to be manually counted. :0 then came to mean something like: « count them yourself. »
The second colon on this line marks the end of the flags and the beginning of the name for a lockfile. Since no name is given procmail will pick one automatically.
This bit is a little complicated. Mail might arrive in bursts. If a new message arrives while your script is still busy processing the last message — you’ll have multiple sendmail processes. Each will be dealing with one message. This isn’t a problem by itself. However — if the two processes might try to write into one file at the same time they are likely to get jumbled in unpredictable ways (the result will not be a properly formatted mail folder).
So we hint to procmail that it will need the check for and create a lockfile. In this particular case we don’t care what the name of the lock file would be (since we’re not going to have *other* programs writing into the backup file). So we leave the last field (after the colon) blank. procmail will then select its own lockfile name.
If we leave the : off of the recipe header line (ommitting the last field entirely) then no lockfile is used.
This is appropriate whenever we intend to only read from the files in the recipe — or in cases where we intend to only write short, single line entries to a file in no particular order (like log file entries).
The way procmail works is:
It receives a single message from sendmail (or some sendmail compatible MTA/MDA). There may be several procmail processing running currently since new messages may be coming in faster than they are being processed.
It opens its recipe file (.procmailrc by default or specified on its command line) and parses each recipe from the first to the last until a message has been « delivered » (or « disposed of » as the case may be).
Any recipe can be a « disposition » or « delivery » of the message. As soon as a message is « delivered » then procmail closes its files, removes its locks and exits.
If procmail reaches the end of it’s rc file (and thus all of the INCLUDE’d files) without « disposing » of the message — than the message is appended to your spool file (which looks like a normal delivery to you and all of your « mail user agents » like Eudora, elm, etc).
This explains why procmail is so forgiving if you have *no* .procmailrc. It simply delivers your message to the spool because it has reached the end of all its recipes (there were none).
The ‘c’ flag causes a recipe to work on a « copy » of the message — meaning that any actions taken by that recipe are not considered to be « dispositions » of the message.
Without the ‘c’ flag this recipe would catch all incoming messages, and all your mail would end up in mail.backup. None of it would get into your spool file and none of the other recipes would be parsed.
The next line in this sample recipe is simply a filename. Like sendmail’s aliases and .forward files — procmail recognizes three sorts of disposition to any message. You can append it to a file, forward it to some other mail address, or filter it through a program.
Actually there is one special form of « delivery » or « disposition » that procmail handles. If you provide it with a directory name (rather than a filename) it will add the message to that directory as a separate file. The name of that file will be based on several rather complicated factors that you don’t have to worry about unless you use the Rand MH system, or some other relatively obscure and « exotic » mail agent.
A procmail recipe generally consists of three parts — a start line (:0 with some flags) some conditions (lines starting with a ‘*’ — asterisk — character) and one « delivery » line which can be file/directory name or a line starting with a ‘!’ — bang — character or a ‘|’ — pipe character.
Here’s another example:
:0 * ^From.*firstname.lastname@example.org
This is a simple one consisting of no flags, one condition and a simple file delivery. It simply throws away any mail from « someone I don’t like. » (/dev/null under Unix is a « bit bucket » — a bottomless well for tossing unwanted output DOS has a similar concept but it’s not nearly as handy).
Here’s a more complex one:
:0 * !^FROM_DAEMON * !^FROM_MAILER * !^X-Loop: email@example.com
This consists of a set of negative conditions (notice that the conditions all start with the ‘!’ character). This means: for any mail that didn’t come from a « daemon » (some automated process) and didn’t come a « mailer » (some other automated process) and which doesn’t contain any header line of the form: « X-Loop: myadd… » send it through the script in my bin directory.
I can put the script directly in the rc file (which is what most procmail users do most of the time). This script might do anything to the mail. In this case — whatever it does had better be good because procmail way will consider any such mail to be delivered and any recipes after this will only be reached by mail from DAEMONs, MAILERs and any mail with that particular X-Loop: line in the header.
These two particular FROM_ conditions are actually « special. » They are preset by procmail and actually refer to a couple of rather complicated regular expressions that are tailored to match the sorts of things that are found in the headers of most mail from daemons and mailers.
The X-Loop: line is a normal procmail condition. In the RFC822 document (which defines what e-mail headers should look like on the Internet) any line started with X- is a « custom » header. This means that any mail program that wants to can add pretty much any X- line it wants.
A common procmail idiom is to add an X-Loop: line to the header of any message that we send out — and to check for our own X-Loop: line before sending out anything. This is to protect against « mail loops » — situations where our mail gets forwarded or « bounced » back to us and we endlessly respond to it.
So, here’s a detailed example of how to use procmail to automatically respond to mail from a particular person. We start with the recipe header.
… then we add our one condition (that the mail appears to be from the person in question):
FROM is a « magic » value for procmail — it checks from, resent-by, and similar header lines. You could also use ^From: — which would only match the header line(s) that start with the string « From: »
The ^ (hiccup or, more technically « caret ») is a « regular expression anchor » (a techie phrase that means « it specifies *where* the pattern must be found in order to match. » There is a whole book on regular expression (O’Reilly & Associates). « regexes » permeate many Unix utilities, scripting languages, and other programs. There are slight differences in « regex » syntax for each application. However the man page for ‘grep’ or ‘egrep’ is an excellent place to learn more.
In this case the hiccup means that the pattern must occur at the beginning of a line (which is its usual meaning in grep, ed/sed, awk, and other contexts).
… and we add a couple of conditions to avoid looping and to avoid responding to automated systems
(These are a couple more « magic » values. The man pages show the exact regexes that are assigned to these keywords — if you’re curious or need to tweak a special condition that is similar to one or the other of these).
… and one more to prevent some tricky loop:
* !^X-Loop: firstname.lastname@example.org
(All of these patterns start with « bangs » (exclammation points) because the condition is that *no* line of the header start with any of these patterns. The ‘bang’ in this case (and most other regex contexts) « negates » or « reverses » the meaning of the pattern).
… now we add a « disposition » — the autoresponse.
| (formail -rk \ -A "X-Loop: email@example.com" \ -A "Precendence: junk"; \ echo "Please don't send me any more mail";\ echo "This is an automated response";\ echo "I'll never see your message";\
echo "So, GO AWAY" ) | $SENDMAIL -t -oi
This is pretty complicated — but here’s how it works:
- The pipe character tells procmail that it should launch a program and feed the message to it.
- The open parenthesis is a Bourne shell construct that groups a set of commands in such a way as to combine the output from all of them into one « stream. » We’ll explain this more later.
- The ‘formail’ command is a handy program that is included with the procmail package. It « formats » mail headers according to its command line switches and its input.
- -rk tells ‘formail’ to format a « reply » and to « keep » the message body. With these switches formail expects a header and body as input.
- The -A parameters tells formail to « add » the next parameter as a header line. The parameters provided to the -A switch must be enclosed in quotes so the shell treats the whole string (spaces and all) as single parameters.
- The backslashes at the end of each line tell procmail mail to treat the next line as part of this one. So, all of the lines ending in backslashes are passed to the shell as one long line.
- This « trailing backslash » or « line continuation » character is a common Unix idiom found in a number of programming languages and configuration file formats.
- The semicolons tell the shell to execute another command — they allow several commands to be issued on the same command line.
- Each of the echo commands should be reasonably self-explanatory. We could have used a ‘cat’ command and put our text into a file if we wanted. We can also call other programs here — like ‘fortune’ or ‘date’ and their output would be combined with the rest of this).
- Now we get to the closing parenthesis. This marks the end of the block of commands that we combined. The output from all of those is fed into the next pipe — which starts the local copy of sendmail (note that this is another variable that procmail toughtfully presets for us).
- The -t switch on sendmail tell it to take the « To: » address from the header of it’s input (where ‘formail -r’ put it) and the -oi switch enables the sendmail « option » to « ignore » lines that consist only of a ‘dot’ (don’t worry about the details on that).
Most of the difficulty in understanding procmail as nothing to do with procmail itself. The intricacies of regular expressions (those wierd things on the ‘*’ — conditional lines) and shell quoting and command syntax, and how to format a reply header that will be acceptable to sendmail (the ‘formail’ and ‘sendmail’ stuff) are the parts that require so much explanation.
The best info on mailbots that I’ve found used to be maintained by Nancy McGough (sp??) at the Infinite Ink web pages:
More information about procmail can be found in Era Eriksson’s « Mini-FAQ. » at http://www.iki.fi/~era/procmail/mini-faq.html
I also have a few procmail and SmartList links off of my own web pages.