Taming rand() and random() LG #63

In the lower levels of the Ontario Science Center in Toronto, Canada, there is a wide circular device made of thin rods of steel. Curious bystanders can take billiard balls, put there for that purpose, and let them loose on the machine.  The balls whiz along their rails, richocheting off pins, clanging through wind chimes, grabbed by counterweighted arms and lifted towards the ceiling.  At several places the balls chose one rail or another purely at random.  How is it that a construct not powered in any way, laid out in a rigid pattern, still produces unexpected results?

Writing programs that use random numbers requires an understanding of error estimation, probability theory, statistics and other advanced numeric disciplines.

Bunk.

Random numbers are about getting your programs to do the unexpected without a core dump being involved.  They’re about having fun.

Computers do not use « real world » random numbers.  Like the billiard-ball machine, computers are rigid, constrained by rules and logical behaviour.  For a computer to generate truly random numbers, it would have to choose numbers by examining real world events.  In the early days, people might roll some 10-sided dice and compose a list of digits for a program to use.

Unfortunately real-world random numbers can be unexpectedly biased.  As the old saying goes, « the real world is a special case. »  Instead, computers rely on mathematics to generate uniformly distributed (that is, random but not too random) numbers.  They are « pseudo-random », generated by mathematic functions which create a seemingly non-repeating sequence.  Over time, the numbers in the sequence will reliably occur equally often, with no one number being favoured over another.

The Linux standard C library (stdlib.h) has two built-in random number functions.  The first, rand(), returns a random integer between 0 and RAND_MAX.  If we type

  printf(  » rand() is %d\n », rand() );
  printf(  » rand() is %d\n », rand() );
  printf(  » rand() is %d\n », rand() );

rand() will return values like

  rand() is 1750354891
  rand() is 2140807809
  rand() is 1844326400

Each invocation will return a new, randomly chosen positive integer number.

READ  Process Cloning in C LG #51

The other standard library function, random(), returns a positive long integer.  On Linux, both integer and long integer numbers are the same size.  random() has some other properties that are discussed below.

There are also older, obsolete functions to produce random numbers

  * drand48/erand48 return a random double between 0..1.   * lrand48/nrand48 return a random long between 0 and 2^31.

  * mrand48/jrand48 return a signed random long.

These are provided for backward compatibility with other flavours of UNIX.

rand() and random() are, of course, totally useless as they appear and are rarely called directly.  It’s not often we’re looking for a number number between 0 and a really big number: the numbers need to apply to actually problems with specific ranges of alternatives.  To tame rand(), its value must be scaled to a more useful range such as between 1 and some specific maximum.  The modulus (%) operator works well: when a number is divided, the remainder is between 0 and 1 less than the original number.  Adding 1 to the modulus result gives the range we’re looking for.

  int rnd( int max ) {
    return (rand() % max) + 1;
  }

This one line function will return numbers between 1 and a specified maximum.  rnd(10) will return numbers between 1 and 10, rnd(50) will return numbers between 1 and 50.  Real life events can be simulated by assigning numbers for different outcomes.  Flipping a coin is rnd(2)==1 for heads, rnd(2)==2 for tails.  Rolling a pair of dice is rnd(6)+rnd(6).

The rand() discussion in the Linux manual recommends that you take the « upper bits » (that is, use division instead of modulus) because they tend to be more random.  However, the rnd() function above is suitably random for most applications.

The following test program generates 100 numbers between 1 and 10, counting how often each number comes up in the sequence.  If the numbers were perfectly uniform, they would appear 10 times each.

  int graph[11];
  int i;

  for (i=1; i