Swatch monitoring on ISD Sql servers

Swatch Overview
Log Files Monitored
Non-perpetual Swatch Jobs
Required Swatch Maintenance Activities
ISD Swatch install FAQ

Swatch Overview

Swatch is (in case you've never heard of it) a set of perl programs written by a Stanford employee named Todd Atkins. His system examines any logfile you wish (either via a perpetual tail -f mechanism or a one-time full examination), and checks each new line added against a formatted action file for keywords. The action file (swatchrc) uses regular expression syntax to comply with the unix grep command.

I have considerably modified both the standard keyword search file (swatchrc) and the swatch perl source code for the ISD environment. We also hacked the (the swatch program file that controls the actions it takes) to allow multiple mail recipeients and to read the subject from the config file (as opposed to a generic subject). All source code is located in max:/export/syb_ops/swatch. This is the home directory for our install of swatch and all changes are made here then distributed out.

Swatch is currently configured and monitoring Sybase servers on 4 different Unix boxes (max,dali,degas,pop). The config files are in $SYBASE/swatch on each server. Errors are emailed to tboss, jmckeeby, and donp currently. If you want to add more people to be notified edit the file in max:/export/syb_ops/swatch and add the email address, then distribute the file to the remote servers running swatch.

Non-perpetual Swatch Operations

Nightly output jobs:

Nightly, we run update statistics and sp_recompile jobs on each server and store the output in files in /export/syb_ops/nightly. Sometimes they contain errors, thus we run a swatch job each morning ( max:/export/syb_ops/nightly/ to run through these files looking for errors:

Dbcc output files

Dbccs are run over the weekend (checkalloc, checktable, checkcatalog) on all applicable databases. Its output files (which will contain details of errors found) are washed through a swatch-invoking script monday morning looking for errors (the swatch config knows to ignore spurrious allocation errors)

The DBCC output files are stored in max:/export/syb_ops/dbcc/[date] where date is yymmdd that the dbcc job started. Each file in this dated subdirectory is in the form [server].[database].dbcc.[date]

max:/export/syb_ops/swatch/ is the onetime swatch script which examines each dbcc output file.

Required Swatch Maintenance

swatch monitors are NOT started automatically upon a server boot. Thus there is a script in the ~sybase/swatch directory of each server that must be run. Swatch is a process table hog; it starts three jobs for each file it looks at.

Additionally, we rotate the backup server errorlogs and the datatools output logs via on some servers (dali and degas). Because of this, we have to recycle the swatch montors on these servers nightly. cron executes the ~sybase/swatch/ script.

Max's /export/syb_ops/swatch directory contains the man pages.

ISD Swatch Operations FAQ

Q: "I want to add/remove someone from the email distribution list for these errors."
A: edit max:/export/syb_ops/swatch/ file, search down for the "$Addresses" variable and add/remove the email addresses you wish. Make sure to follow the existing format (you must escape the @ sign). Once completed you must ftp this file to the supported machines, replace the existing file, and restart the swatch monitors on each server.

Q: "I just changed one of the three main perl scripts for swatch (swatch,, and ftp'd them to degas and zeus. Now they don't work on the remote servers. Why?"
A: Aix boxes have perl in a different directory location than Solaris. This fact should be documented at the top of the scripts, but if it isn't simply replace /usr/local/bin/perl for all instances of /bin/perl.

Q: "I want to read the man pages in max:/export/syb_ops/swatch" but i don't know how?
A: nroff -man or nroff -man

Q: "I need to kill the swatch monitors temporarily because i'm doing some testing and i don't want to flood the recipients email?"
A: On each server edit the script in ~sybase/swatch. This script contains two functions: killem and startem. Comment out the startem() function and run the script. Just make sure to uncomment this line so will work correctly out of cron.

Q: "I just found out an error thats occuring that i want to start/stop monitoring for. How do i modify the swatchrc file for this?
A: heres the steps you need to take:
1. edit max:/export/syb_ops/swatch/swatchrc. Lines are in this format:

/infected with 11/mail=Memory infected with 11

Two fields, tab delimited. The first field is the grep search string in regular expression format. The second field is the action to be taken; i either ignore or mail=subject. You can have multiple messages receiving the same action using the "|" character within the /.../ regular expression delimiters. #'s in front of lines are comments. Edit the file with what you want, increment the version number at the top and save.
2. execute max:/export/syb_ops/swatch/ script, which uses ftp and .netrc to "push" the swatchrc file to the monitored servers.
3. Log into the remote servers and execute the ~sybase/swatch/ script to restart the monitors and have the new changes take effect. Note; dali and degas restart the monitors automatically each night regardless.

Q: "All of a sudden swatch doens't work at all, quickly exiting saying "Command not found." What happened?
A: this is a perl error message that is obscurely telling you that it can no longer find the perl binary referenced in the #!/usr/bin/perl line at the top of swatch. Fix the reference and you're back in business.

Q: "I rm'd a log file that swatch monitors that was getting really big. Now swatch isn't picking up the new messages getting sent there?"
A: if you rotate a log file swatch is monitoring, or mv it, or rm it and let whatever program recreate a new logfile, swatch will not work. Swatch uses the inode of a file and not the filename as a pointer reference (like all unix commands) thus you need to restart the swatch binaries.

