Swatch monitoring on ISD Sql servers
Swatch is (in case you've never heard of it) a set of perl programs written by a Stanford employee named Todd Atkins. His system examines any logfile you wish (either via a perpetual tail -f mechanism or a one-time full examination), and checks each new line added against a formatted action file for keywords. The action file (swatchrc) uses regular expression syntax to comply with the unix grep command.
I have considerably modified both the standard keyword search file (swatchrc) and the swatch perl source code for the ISD environment. We also hacked the sw_actions.pl (the swatch program file that controls the actions it takes) to allow multiple mail recipeients and to read the subject from the config file (as opposed to a generic subject). All source code is located in max:/export/syb_ops/swatch. This is the home directory for our install of swatch and all changes are made here then distributed out.
Swatch is currently configured and monitoring Sybase servers on 4 different Unix boxes (max,dali,degas,pop). The config files are in $SYBASE/swatch on each server. Errors are emailed to tboss, jmckeeby, and donp currently. If you want to add more people to be notified edit the sw_actions.pl file in max:/export/syb_ops/swatch and add the email address, then distribute the sw_actions.pl file to the remote servers running swatch.back to top
Nightly output jobs:
Nightly, we run update statistics and sp_recompile jobs on each server and store the output in files in /export/syb_ops/nightly. Sometimes they contain errors, thus we run a swatch job each morning ( max:/export/syb_ops/nightly/nightly_swatch.sh) to run through these files looking for errors:
Dbccs are run over the weekend (checkalloc, checktable, checkcatalog) on all applicable databases. Its output files (which will contain details of errors found) are washed through a swatch-invoking script monday morning looking for errors (the swatch config knows to ignore spurrious allocation errors)
The DBCC output files are stored in max:/export/syb_ops/dbcc/[date] where date is yymmdd that the dbcc job started. Each file in this dated subdirectory is in the form [server].[database].dbcc.[date]
max:/export/syb_ops/swatch/weekend_swatch.sh is the onetime swatch script which examines each dbcc output file.back to top
swatch monitors are NOT started automatically upon a server boot. Thus there is a restart.sh script in the ~sybase/swatch directory of each server that must be run. Swatch is a process table hog; it starts three jobs for each file it looks at.
Additionally, we rotate the backup server errorlogs and the datatools output logs via rotate_logs.sh on some servers (dali and degas). Because of this, we have to recycle the swatch montors on these servers nightly. cron executes the ~sybase/swatch/restart.sh script.
Max's /export/syb_ops/swatch directory contains the man pages.back to top
Q: "I want to add/remove someone from the email distribution list for these
A: edit max:/export/syb_ops/swatch/sw_actions.pl file, search down for the "$Addresses" variable and add/remove the email addresses you wish. Make sure to follow the existing format (you must escape the @ sign). Once completed you must ftp this file to the supported machines, replace the existing sw_actions.pl file, and restart the swatch monitors on each server.
Q: "I just changed one of the three main perl scripts for swatch (swatch,
sw_actions.pl, sw_history.pl) and ftp'd them to degas and zeus. Now they
don't work on the remote servers. Why?"
A: Aix boxes have perl in a different directory location than Solaris. This fact should be documented at the top of the scripts, but if it isn't simply replace /usr/local/bin/perl for all instances of /bin/perl.
Q: "I want to read the man pages in max:/export/syb_ops/swatch" but i don't
A: nroff -man swatch.conf.man or nroff -man swatch.prog.man
Q: "I need to kill the swatch monitors temporarily because i'm doing some
testing and i don't want to flood the recipients email?"
A: On each server edit the restart.sh script in ~sybase/swatch. This script contains two functions: killem and startem. Comment out the startem() function and run the script. Just make sure to uncomment this line so restart.sh will work correctly out of cron.
Q: "I just found out an error thats occuring that i want to start/stop
monitoring for. How do i modify the swatchrc file for this?
A: heres the steps you need to take:
1. edit max:/export/syb_ops/swatch/swatchrc. Lines are in this format:
/infected with 11/mail=Memory infected with 11
Two fields, tab delimited. The first field is the grep search string in regular expression format. The second field is the action to be taken; i either ignore or mail=subject. You can have multiple messages receiving the same action using the "|" character within the /.../ regular expression delimiters. #'s in front of lines are comments. Edit the file with what you want, increment the version number at the top and save.
2. execute max:/export/syb_ops/swatch/distswatch.sh script, which uses ftp and .netrc to "push" the swatchrc file to the monitored servers.
3. Log into the remote servers and execute the ~sybase/swatch/restart.sh script to restart the monitors and have the new changes take effect. Note; dali and degas restart the monitors automatically each night regardless.
Q: "All of a sudden swatch doens't work at all, quickly exiting saying
"Command not found." What happened?
A: this is a perl error message that is obscurely telling you that it can no longer find the perl binary referenced in the #!/usr/bin/perl line at the top of swatch. Fix the reference and you're back in business.
Q: "I rm'd a log file that swatch monitors that was getting
really big. Now swatch isn't picking up the new messages getting sent
A: if you rotate a log file swatch is monitoring, or mv it, or rm it and let whatever program recreate a new logfile, swatch will not work. Swatch uses the inode of a file and not the filename as a pointer reference (like all unix commands) thus you need to restart the swatch binaries.