Big Brother Unix Network Monitoring System

A Web-based Unix Network Monitoring
and Notification System

By Paul M. Sittler, p-sittler@tamu.edu

Big Brother is Watching. . .

I wasn’t bored: I don’t have time to be bored. Texas Agricultural Extension Service operates a fairly large enterprise-wide network that stretches across hell’s half acre, otherwise known as Texas. We have around 3,000 users in 249 counties and 12 district offices who expect to get their e-mail and files across our Wide Area Network. Some users actually expect the network to work most of the time. We use ethernet networking with Novell servers at some 35 locations, 15 or so whose routers are connected via a mixture of 56Kb circuits, fractional T1, Frame-Relay, and radio links. We are not currently using barbed wire fences for our network, regardless of what you may have heard. . .

I am privileged to be part of the team that set up that network and tries to keep it going. We do not live in a perfect network world. Things happen. Scarcely a day goes by when we do not have one or more WAN link outages, usually of short duration. We sometimes have our hands full trying to keep all the pieces connected. Did I mention that the users expect the mail and other software to actually work?

Cruising the USENET newsgroups, I read a posting about « Big Brother, a solution to the problem of Unix Systems Monitoring » written by Sean MacGuire of Montreal, Canada. I was intrigued to notice that Big Brother was a collection of shell scripts and simple c programs designed to monitor a bunch of Unix machines on a network. So what if most of our mission critical servers were Novell-based? Who cares if some of our web servers run on Macintosh, OS/2, Win’95 or NT? We use both Linux and various flavours of Unix in a surprisingly large number of places.

We had cooked up a number of homemade monitoring systems. Pinging and tracerouting to all the servers can be very informative. We looked at a bunch of proprietary (and expensive) network monitoring systems. It is amazing how much money these things can cost. System adminstrators often reported difficult installations and software incompatibilities with the monitoring software. Thus, frustrated users often gave us our first hint that all was not well.

According to the blurb on Big Brother:

« Big Brother is a loosely-coupled distributed set of tools for monitoring and displaying the current status of an entire Unix network and notifying the admin should need be. It came about as the result of automating the day to day tasks encountered while actively administering Unix systems. »

The USENET news article provided a URL (« http://www.iti.qc.ca/iti/users/sean/bb-dnld/ ») to the home site of Big Brother. I pointed my browser to it and was rewarded with a purple-sided screen background and a blue image of a sinister face peering out under the caption « big brother is watching. » After my initial shock, I learned that Big Brother featured:

I was fascinated. Especially by the last item, that said it was free with source code. (I often tell people that Linux isn’t free, but priceless. . .) So what could a priceless package do for me? What on earth did Big Brother check?

Overall, very sensible. Looking for some « gotchas, » I found that I would need a Unix-based machine, and:

A web server was no problem, as we run many. A c compiler came with Linux, and we use kermit on many machines with modems. So far, so good.

The web site provided links to a few demonstration sites, and a link to download it as well. I connected to a demonstration site and was greeted with an amazing display:

System OK
Attention
Trouble
No report

Updated
@ 22:52





iti-s01
router-000
inet-gw-0

Big Brother is watching! As I endured the scrutiny of the Orwellian face peering out at me, I examined the rest of the display. The display was coded like a traffic signal (green/yellow/red), and the update time was clearly displayed beneath it. To the right of « Big Brother » were four buttons, marked clearly « Help, » « Info, » « Page » and « View. » Beneath the header area was a table with six column headings and three rows, each neatly labelled with a computer hostname. The boxes formed by the intersection of the rows and columns contained attractive green and yellow balls. The overall effect was like a decorated tree. The left side of the screen had a yellow tint, gradually becoming black at the center.

I selected the « Help » button and was rewarded with a brief explanation of what Big Brother was all about. Choosing the « Info » Button provided a much longer and more detailed explanation of the system, including a graphic that really was worth a thousand words. I tried the « Page » button to discover that this was a way to send a signal to a radio-linked pager. Not at all what I had expected! Finally, the « View » selection provided a briefer but perhaps more useful view of the information, isolating only the systems with problems.

In this case, only the « iti-s01 » system was displayed. My browser cursor indicated a link as it passed over each colored dot, so I clicked on the blinking yellow dot and received a message that read:

« yellow Tue Feb 18 22:50:53 EST 1997 Feb 16 12:22:33 iti-s01 kernel: WARNING: / was not properly dismounted »

This puzzled me at first. How on earth could it know that? It seems that BB (Big Brother) checks the system /var/log/messages file periodically and alerts on any line that says either « WARNING » or « NOTICE. » As I am certain that Sean MacGuire is very conscientious, I suspect that he adds that line to his message file so that something will appear to be wrong.

Suddenly, my screen spontaneously updated! The update time had changed by five minutes, and a blinking yellow dot appeared under the column labelled « procs. » I clicked on the blinking yellow dot and was informed that the sendmail process was not running. This got me really interested! Apparently, Big Brother could monitor whether selected processes were running!

I was also a little puzzled about the screen being updated on its own. I used my browser to view the document source and discovered some html commands that were new to me:

    
    

The first line instructs browsers to get an update every 120 seconds. The second line tells the browser that it should get a new copy after the expiration time and date. Very clever!

I returned to the graphics window and discovered that the yellow area on the left had changed to red! A new hostname row appeared with a blinking red dot under the column labelled « conn. » I clicked on the blinking red dot and read a message that said:

« red Tue Feb 18 22:59:11 CST 1997 bb-network.sh: Can’t connect to router-000… (paging) »

The connection to the machine called router-000 had been interrupted and the administrator had been paged. Amazingly, while in Texas, I had become aware of a network outage in Montreal, Canada. This really had possibilities. Perhaps I might someday be able to take a vacation!

READ  mini-HOWTO install qmail with MH: Fetchmail

Big Brother Installation

I was so impressed with Big Brother that I decided to try to use it. Sean has thoughtfully made its acquisition easy, but requests that you fill out an on-line registration form with your name and e-mail address. He would also like to know where you heard about Big Brother. I filled these out in early November 1996, and received an e-mail survey form in late December.

When I clicked on the link to download Big Brother, I ended up with a file called « bb-src.tgz. » I impetuously gunzipped this to get « bb-src.tar. » I then thought better of the impending error of my ways and decided to download and print the installation instructions.

Just in case, I also grabbed and printed the debugging information so thoughtfully provided (as it turned out, I did not need it):

I had no real problems following the installation instructions. I decided to make the $BBHOME directory « /usr/src/bb »; use whatever makes sense to you. The automatic configuration routines are said to work for AIX, FreeBSD, HPUX 10, Irix, Linux, NetBSD, OSF, RedHat Linux, SCO, SCO 3/5, Solaris, SunOS4.1, and UnixWare. I can vouch for Linux, RedHat Linux, Solaris, and SunOS 4.1.

The c programs compiled without incident, and the installation went smoothly. As always, your mileage may vary. In less than an hour, I was looking at Big Brother’s display of coloured lights!

At this point, you may wish to re-examine the documentation and information files. Personalize your installation as desired. Above all, have fun!

Hacking

I admit it. I am a closet hacker. I saw many things about the stock BB distribution that I wanted to improve. Big Brother’s modular and elegantly simple construction makes it a joy to modify as desired. The shell scripts are portable, simple, well documented, and easy to understand. The use of the modified hosts file to determine which hosts to monitor was gratifyingly familiar. The « bbclient » script made it extremely easy to move the required components to another similar Unix host. Sean has done a remarkable job in making this package easy to install!

I got obsessive-compulsive about hacking BB and modified it slightly, working from Sean MacGuire’s v1.03 distribution as a base. I forwarded my changes to him for possible inclusion in a later distribution.

Features that I added to BB proper include (code added is bold):

  • Links to the info files in the brief view (bb2.html). That’s when I need them the most.
  • Links to html info files for each column heading and the column info files themselves. These are placed in the html directory along with bb.html and bb2.html and have boring names like conn.html, cpu.html, . . . smtp.html.
  • Checks to see if ftp servers, pop3 post offices, and SMTP Mail Transfer Agents (MTA’s) are accessible ($BBHOME/bin/bb-network.sh). These all simply use bbnet to telnet to the respective ports. This followed Sean’s style of adding comments to the bb-hosts file as follows:
    128.194.44.99 behemoth.tamu.edu # BBPAGER smtp ftp pop3
    165.91.132.4 bryan-ctr.tamu.edu # pop3 smtp
    128.194.147.128 csdl.tamu.edu # http://csdl.tamu.edu/ ftp smtp 
    
  • I added some environment variables to $BBHOME/etc/bbdef.sh for the added monitoring as follows:
    #
    # WARNING AND PANIC LEVELS FOR DIFFERENT THINGS
    # SEASON TO TASTE
    #
    DFPAGE=Y # PAGE ON DISK FULL (Y/N)
    CPUPAGE=Y # PAGE FOR CPU Y/N
    TELNETPAGE=Y # PAGE ON TELNET FAILURE?
    HTTPPAGE=Y # PAGE ON HTTP FAILURE?
    FTPPAGE=Y # PAGE ON FTPD FAILURE?
    POP3PAGE=Y # PAGE ON POP3 PO FAILURE?
    SMTPPAGE=Y # PAGE ON SMTP MTA FAILURE?
    export DFPAGE CPUPAGE TELNETPAGE HTTPPAGE FTPPAGE POP3PAGE SMTPPAGE
    
  • I updated the bb-info.html and bb-help.html pages to reflect a version of 1.03a and a date of 10 February 1997. I also modified them to add brief mention of the new ftp, pop3, and smtp monitoring things. Specifically, I changed the bb-help.html file to add new pager codes for them as follows:
    100 - Disk Error. Disk is over 95% full...
    200 - CPU Error. CPU load average is unacceptably high.
    300 - Process Error. An important process has died.
    400 - Message file contains a serious error.
    500 - Network error, can't connect to that IP address.
    600 - Web server HTTP error - server is down.
    610 - Ftp server error - server is down.
    620 - POP3 server error - PopMail Post Office is down.
    630 - SMTP MTA error - SMTP Mail Host is down.
    911 - User Page. Message is phone number to call back.
    
  • I added sections to the bb-info.html file to explain the added ftp, pop3, and smtp monitoring.
  • I use a standard tagline file on each html page that identifies the author and location of the page. Thus, mkbb.sh and mkbb2.sh now look for an optional tagline file to incorporate into the html documents that they generate. The optional files are named mkbb.tag (for mkbb.sh) and mkbb2.tag (for mkbb2.sh). The shell scripts look for the optional tagline files in the $BBHOME/web directory (which is where the mkbb.sh and mkbb2.sh files reside).
  • I went through ALL of the html-generating scripts and html files to ensure that they actually had sections and properly placed double quotes around the various arguments.
  • For the most part, I edited the files so that everything would fit on an 80-column screen.
  • I modified $BBHOME/etc/bbsys.sh to make it easier to ignore certain disk volumes as follows:
    #
    # DISK INFORMATION
    #
    DFSORT="4" # % COLUMN - 1
    DFUSE="^/dev" # PATTERN FOR LINES TO INCLUDE
    DFEXCLUDE="-E dos|cdrom"       # PATTERN FOR LINES TO EXCLUDE
    
  • I modified $BBHOME/etc/bbsys.linux so that the ping program is properly found as follows:
    #
    # bbsys.linux
    #
    # BIG BROTHER
    # OPERATING SYSTEM DEPENDENT THINGS THAT ARE NEEDED
    #
    PING="/bin/ping" # LINUX CONNECTIVITY TEST
    PS="/bin/ps -ax"              # LINUX
    DF="/bin/df -k"
    MSGFILE="/var/adm/messages"
    TOUCH="/bin/touch"            # SPECIAL TO LINUX
    
  • I added the ability to dynamically traceroute and ping each system being monitored. I spoke with Sean about it, and, in keeping with the KISS (Keep It Simple, Stupid) principle, we thought these features were best added in the info files. The user portion is pretty obvious in the source to the info file. The cgi scripts are very simple shell scripts included below:
# traceroute.cgi ===========================================
#!/bin/sh

TRACEROUTE=/usr/bin/traceroute

echo Content-type: text/html
echo

if [ -x $TRACEROUTE ]; then
        if [ $# = 0 ]; then
                cat