perlpp: cpp on Steroids LG #44

The point of this article is to introduce a tool I call perlpp, the Perl preprocessor. Since I wrote it, perlpp is not available in any Linux distribution. See Resources for information on obtaining perlpp and the examples described here.

perlpp is a beefy version of cpp, the C preprocessor; it can do what cpp can do and much more. For example, introducing the idea of code templates in any programming language is easy with perlpp.

Using perlpp, the Perl preprocessor, requires at least a rudimentary knowledge of programming in Perl. Perl 5 or later must be installed on your system.

Since Perl is such a useful language, almost every programmer should know a little about it. I will start by covering some of the rudiments of Perl used in the examples. If you are already fairly comfortable with Perl, move on to the next section.

Variables. Scalar variables, which can take on values of strings, integers, or doubles, always have a $ as the first character. List variables, which are simple lists of scalars, always have a @ as the first character. All variables are global, unless preceded by my when first used within a block.

String quoting. Strings can be quoted three ways in Perl. They can be quoted almost exactly using single forward quotes (‘), quoted with interpolation using double quotes (« ), or system quotes using single back quotes (`). We will present more detail on this later, but basically:

  • Single-quoted strings are subject to minimal translation. For example, ‘\n’ is a backslash followed by the letter n.
  • Double-quoted strings have a great deal of translation. For example, « i=$i\n » is the characters i=, followed by the value of the variable $i, followed by a new-line character. In Perl parlance, double-quoted strings are said to be interpolated.
  • Back-quoted strings are interpolated like double-quoted strings, but the value of a back-quoted string is the output (whatever is sent to STDOUT) of executing the translated string as a shell command. For example, `ls $dir` is the output of running the ls command with the value of $dir as an argument.

Loops. Perl supports the csh-style loop of the form

foreach $index (@LIST) { 
   statement1;
   statement2; 
   .... 
}

as well as the C-style loop:

for (do-once; check-first-each-time; do-last-each-time) { 
   statement1;
   statement2; 
   .... 
}

Both types are used in the examples.

In fact, the basic syntax of Perl mimics C in many respects, so C programmers can read Perl scripts fairly easy. No, that is too bold: a C programmer can write C-looking Perl, and it will mostly work as expected. A Perl programmer would solve the same problem in a completely different manner. In doing so, he may accomplish something difficult to imagine: a program more obscure than what can readily be written in C. If you don’t believe me, look at the perlpp source, which is a Perl script.

READ  Joe Kaplenk and the OSes LG #60

Perl is a great deal more than this tiny view, but these ideas should be enough to understand the examples. See Resources for more information about Perl.

Introduction

Let’s begin by talking about cpp. C programmers don’t get far before learning that C programs, at least logically, pass through two stages of translation. The first stage, the preprocessing stage, uses commands such as

#include 

and

#define FOO(x) bar(x)

to translate the hybrid C/cpp input file into a pure C input file, which is then input to the pure C compiler. Pictorially,

input_file -> cpp -> cc1 -> object_file

While the intended job of cpp is to preprocess input files for a C (or C++) compiler, it can be used to preprocess other files. For example, xrdb uses cpp to preprocess X11 resource files before loading them. cpp is a very useful tool, but a programmer can quickly run into limitations, essentially because cpp is a macro-processor with limited facilities for computation and the manipulation of text.

The reason I wrote perlpp was to overcome these limitations for a scientific computation problem at Pacific Northwest National Laboratories, where I wrote the chemical equilibrium portion of a ground water transport model. For the sake of compatibility with the rest of the model, it had to be programmed in FORTRAN. For the sake of compatibility with Linux, Sun and SGI development environments, it had to be FORTRAN 77. The problem statement was roughly this: given the chemical equilibrium equations for a given set of species, automatically generate an efficient reliable solver for these equations.

This created a need to go from chemical equilibrium equations in symbolic form to the generation of a Maple V (a symbolic mathematics package) batch file from a template, followed by the inclusion of the results from that batch file into a template-generated FORTRAN subroutine library that satisfied the requirements of the project.

This environment required the automatic generation of several kinds of programs from templates and was a natural breeding ground for thoughts about useful preprocessors. Although it took me most of a week to come up with the alpha version of perlpp, it easily saved that amount of time just for that one project. Solving the same problem without it may have taken four or five weeks longer. Furthermore, without perlpp, the project would be much harder to maintain.

READ  The DICT Project LG #33

What Perlpp Does

perlpp takes input files and generates perl scripts which, when run, create similar but better output files.

Example 1: Hello World!

Create a file called hello.c.ppp containing the lines

#include 
int main()
{
printf("Hello World!\n");
return 0;
}

Now run the perlpp command by typing:

perlpp -pl hello.c.ppp

The -pl option is discussed later. If you check, perlpp created the file hello.c.pl, which contains the following Perl script:

#!/usr/bin/perl
print '#include 
';
print 'int main()
';
print '{
';
print '  printf("Hello World!\\n");
';
print '  return 0;
';
print '}
';

Your mileage may vary on the exact contents of the first line. See « Troubleshooting » if you have problems generating this script.

Running hello.c.pl generates the same text as the original input file, hello.c.ppp. In this way, perlpp can be viewed as an obscure and computationally expensive way to copy text files.

The -pl option means « create a perl program ». If you leave it off, it simply runs the program and saves the output in hello.c. This means

perlpp hello.c.ppp

is equivalent to

perlpp -pl hello.c.ppp
  ./hello.c.pl > hello.c
  rm hello.c.pl

except that the file hello.c.pl is never explicitly created.

So our first example, hello.c.ppp, when normally processed by perlpp, creates a copy of itself, hello.c. While this should not excite you, it should not surprise you either. After all, if you processed a text file using cpp, containing no cpp directives, you would get back exactly what you put in.

cpp is interesting only when the input file contains cpp directives. Perlpp is only slightly interesting when the input file contains no perlpp directives, because it generates a Perl script that regenerates the input file using print statements. To get any further, the perlpp directives must be used.

Directives

Only four directives are available for perlpp, along with a default directive. Each describes how a given line of input will be translated into the perl script.

  1. ! Perl source rule: if the first character of a line is a ! (bang), copy the remaining part of the line to the generated perl script verbatim.
  2. ‘ print exact: If the first character of a line is a ‘ (single quote), then generate a single-quoted (uninterpolated) print statement. Executing this print statement will produce the remaining part of the input line exactly.
  3.  » print interpolated: if the first character of a line is a  » (double quote), generate a double-quoted (interpolating) print statement. For more on interpolating strings, see the perlop man page. If use locale is in effect, the case map used by \l, \L, \u and is taken from the current locale. See the perllocale man page. [It should be noted that \\ (two backslashes) in an interpolated string translates into a single backslash, so \\n interpolates to \n in the output. This will show up in our next example.]
  4. ` print system: if the first character of a line is a ` (back quote), then generate a back-quoted (system) print statement. Executing this print statement will produce the output of, first, interpolating the remainder of the line as in rule 2 above, then running the interpolated text as a shell command.
READ  Programming

If none of the characters bang(!), single quote(‘), double quote(« ) or back quote(`) begin a line, a default translation occurs:

  • With no -qq option, perlpp treats these lines as if they began with a single quote, i.e., use the « print exact » rule 2.
  • With the -qq option, perlpp treats these lines as if they began with a double quote, i.e., use the « print interpolated » rule 3.

Example 2: Salutations

Create a file called salutations.c.ppp containing the lines:

  #include 
  int main()
  {
  !foreach $s ('Hello World!','Hola Mundo!', 'Ciao!') {
  "  printf("$s\\n");
  !}
    return 0;
  }

Let’s first look at the generated Perl script by typing:

perlpp -pl salutations.c.ppp

In salutations.c.pl, you will find

  print '#include 
  ';
  print 'int main()
  ';
  print '{
  ';
  foreach $s ('Hello World!','Hola Mundo!', 'Ciao!') { 
  print "  printf(\"$s\\n\");
  ";
  }
  print '  return 0;
  ';
  print '}
  ';

Look carefully at the print statement generated by the printf statement in salutations.c.ppp:

print "  printf(\"$s\\n\");
  ";

Perlpp goes to the trouble of adding backslashes where appropriate so that double quotes do not prematurely terminate the string. The same idea applies to the other forms of quoted print statements perlpp generates.

Let perlpp run this script for us with

perlpp salutations.c.ppp

This generates the file salutations.c,

#include 
int main()
{
printf("Hello World!\n");
printf("Hola Mundo!\n");
printf("Ciao!\n");
return 0;
}

Example 3: Fast Point Template

This last example uses perlpp to generate a template for fixed-length vector classes in C++, where loops are unwound. Unwinding a loop means, for example, replacing the code

for (int i=0; i