The TSMmonitor source code was moved to github:
https://github.com/thobiast/tsmmonitor
This script is developed to provide an easy, customizable and effective way to monitor TSM (Tivoli Storage Manager) Servers.
It is composed of functions to check specific TSM resources. Each check returns the resource status. The available status for a resource are:
The status returned is based on defined thresholds for each check. For example, the function to check the TSM Database utilization:
prompt> ./tsmmonitor db -h check tsm database utilization The default percentages are: warning..: 85 critical.: 90 Usage..: tsmmonitor db [options] [warning] [critical] -v6 check database utilization for TSM version 6 Usage..: tsmmonitor db [warning] [critical] Example: tsmmonitor db tsmmonitor db 80 95 tsmmonitor db -v6 80 90
The status returned from db check depends on the warning and critical threshold values. These values can be customized using command line arguments:
prompt> ./tsmmonitor db db: database utilization 81%, OK prompt> ./tsmmonitor db 80 90 db: database utilization 81%, Warning prompt> ./tsmmonitor db 60 80 db: database utilization 81%, Critical
Some nice features:
This script should work fine under most *NIX variants. It has been tested successfully under many Linux and AIX (4.3, 5.2 and 5.3). If you have any problem, please let me know.
Suggestions and bug reports are very welcome. Contact me at: <thobias (a) thobias org>
Version 2.2, released on 07/11/2012 (DD/MM/YYYY)
Version 2.1, released on 11/09/2009 (DD/MM/YYYY)
Version 2.0, released on 28/11/2008 (DD/MM/YYYY)
Version 1.0, released on 15/06/2007 (DD/MM/YYYY)
Just download this file: tsmmonitor
Using it is quite simple. It doesn't require installation. All you have to do is set three options in the source code. Edit the script and look for this section:
############### tsm server information # # dsmadmc command path DSMADMC='/usr/bin/dsmadmc' # tsm user to connect to the tsm server and perform the checks USER='' # tsm user password PASS=''
Only these variables must be changed. If everything is fine, you can now test the script:
tsmmonitor --help
tsmmonitor db
Take a look at the source code for more customizations.
For nerds, source code highlighting.
The complete list of the available checks.
No | Check | Description |
---|---|---|
1 | help | show all checks help |
2 | db | check tsm database utilization |
3 | dbbkp | check how many tsm db backup there are in the last 24 hours |
4 | dbfrag | check tsm database fragmentation |
5 | dbvol | check number of database volumes not synchronized (copy status) |
6 | diskvol | check number of disk volumes without readwrite access |
7 | drive | check number of drives not online |
8 | drmvol | check number of DRM volumes |
9 | falseprivate | check false private tapes |
10 | lic | check server license compliance |
11 | log | check tsm recovery log utilization |
12 | logvol | check number of log volumes not synchronized (copy status) |
13 | nodeslocked | check number of nodes locked |
14 | numnodes | check number of nodes |
15 | numsess | check number of nodes sessions |
16 | path | check number of paths not online |
17 | sched | check the number of schedules not completed (only today's schedules) |
18 | scratch | check number of scratch tapes |
19 | searchanr | Search for a specific ANR in the last N hours (default is 1h) |
20 | stgpool | check a storage pool utilization |
21 | tapeslib | check how many tapes are in the library |
22 | tapesown | check how many tapes have a specific owner |
23 | tapesstgpool | check how many volumes are in a specific storage pool |
24 | unav | check number of unavailable volumes |
25 | volerr | check number of volumes with error (error_state) |
26 | volreclaim | check for volumes with percentage reclaimable space greater than |
Some samples to show tsmmonitor in action:
prompt> tsmmonitor -h Usage: tsmmonitor [options] [check] [options_check] Options -u, --user tsm user to connect to the tsm server -p, --pass tsm user password to connect to the tsm server -s, --servername specify tsm servername -m, --mail mail addresses separated by blank space -q, --quiet quiet mode, suppress all output (except errors) -S, --source print the check source code -h, --help print this help information and exit -V, --version print program version and exit The following checks are available: help, db, log, scratch, drive, path, dbfrag, unav, stgpool, volerr, volreclaim, tapeslib, tapesown, tapesstgpool, dbbkp, numsess, numnodes, nodeslocked, diskvol, dbvol, logvol, searchanr, drmvol, sched, lic Try 'tsmmonitor <check> --help' for more information. Example: tsmmonitor db --help tsmmonitor db tsmmonitor db -v6 tsmmonitor -m='user1@somewhere.com user2@somewhere.com' db tsmmonitor --servername=tsmsrv01 db tsmmonitor --servername=tsmsrv02 db 85 95 tsmmonitor -u=user1 -p=xxx -s=tsmsrv02 db 85 95
Showing the help of db check:
prompt> tsmmonitor db -h check tsm database utilization The default percentages are: warning..: 85 critical.: 90 Usage..: tsmmonitor db [warning] [critical] Example: tsmmonitor db tsmmonitor db 80 95
Checking the TSM database utilization:
prompt> tsmmonitor db db: database utilization 79%, Ok prompt> echo $? 0
Checking the TSM database utilization specifying different percentage for warning and critical status:
prompt> tsmmonitor db 70 85 db: database utilization 79%, Warning prompt> echo $? 1
prompt> tsmmonitor db 60 75 db: database utilization 79%, Critical prompt> echo $? 2
Checking the number of volumes with error:
prompt> tsmmonitor volerr volerr: number of volumes with error 2, Critical
Checking the number of volumes unavailable:
prompt> tsmmonitor unav unav: number of unavailable volumes 1, Warning.
Checking the number of volumes unavailable with verbose option:
prompt> tsmmonitor unav -v unav: number of unavailable volumes 1, Warning. Volumes: R00043L3
Checking the number of drives not online:
prompt> tsmmonitor drive drive: number of drives not online 0, OK
TSMmonitor can be used transparently as a nagios plugin. Nagios plugins are based on script return code:
These are the same return codes used by TSMmonitor.
You can use the alert notification to receive an e-mail when the status changes. This feature is disabled by default. To turn on, you have to change the following options in the source code:
############### send notification # # at every time that a check changes the status, # an alert (notification) will be sent by mail. default is off SEND_ALERT=0 # 1 = on and 0 = off # e-mails which will receive the notifications. mail addresses are separated # by blank space. ex: MAILTO='xxx@yyy.zzz aaa@bbb.zzz ppp@qqq.lll' MAILTO='' # temp directory where tsmmonitor will record check status. # it is necessary to send mail when the check status changes TEMPDIR='/tmp'
Remeber you can specify different mail addresses in the command line:
prompt> tsmmonitor -m=user2@somewhere.com db
You can use the cron to execute scheduled checks and receive the tsmmonitor alerts by e-mail.
prompt> crontab -l */15 * * * * /PATH/tsmmonitor db > /dev/null */10 * * * * /PATH/tsmmonitor log > /dev/null */10 * * * * /PATH/tsmmonitor drive > /dev/null */10 * * * * /PATH/tsmmonitor path > /dev/null */15 * * * * /PATH/tsmmonitor scratch > /dev/null
HTML version of the command tsmmonitor help.
--------------------------------------------------------------------- show all checks help Usage..: tsmmonitor help Example: tsmmonitor help --------------------------------------------------------------------- check tsm database utilization The default percentages are: warning..: 85 critical.: 90 Usage..: tsmmonitor db [options] [warning] [critical] -v6 check database utilization for TSM version 6 Example: tsmmonitor db tsmmonitor db 80 95 tsmmonitor db -v6 80 90 --------------------------------------------------------------------- check tsm recovery log utilization The default percentages are: warning..: 60 critical.: 80 Usage..: tsmmonitor log [options] [warning] [critical] -v6 check active log utilization for TSM version 6 Example: tsmmonitor log tsmmonitor log 80 95 tsmmonitor log -v6 70 80 --------------------------------------------------------------------- check number of scratch tapes The default numbers are: warning..: 10 critical.: 6 Usage..: tsmmonitor scratch [options] [warning] [critical] -l, --library=LIBRARY_NAME check for scratch in the library only Example: tsmmonitor scratch tsmmonitor scratch 8 4 tsmmonitor scratch -l=LTOLIB3 8 4 tsmmonitor scratch -l=LTOLIB3 --------------------------------------------------------------------- check number of drives not online The default numbers are: warning..: 1 critical.: 3 Usage..: tsmmonitor drive [options] [warning] [critical] -l, --library=LIBRARY_NAME check in the specific library only Example: tsmmonitor drive tsmmonitor drive 2 3 tsmmonitor drive -l=LTOLIB3 1 2 tsmmonitor drive -l=LTOLIB3 --------------------------------------------------------------------- check number of paths not online The default numbers are: warning..: 1 critical.: 3 Usage..: tsmmonitor path [options] [warning] [critical] -s, --source=SOURCE_NAME check path with a specific source name Example: tsmmonitor path tsmmonitor path 2 4 tsmmonitor path -s=LANFREE1 1 4 tsmmonitor path -s=LANFREE1 --------------------------------------------------------------------- check tsm database fragmentation The default numbers are: warning..: 60 critical.: 80 Usage..: tsmmonitor dbfrag [warning] [critical] Example: tsmmonitor dbfrag tsmmonitor dbfrag 50 75 --------------------------------------------------------------------- check number of unavailable volumes The default numbers are: warning..: 1 critical.: 5 Usage..: tsmmonitor unav [options] [warning] [critical] -d, --deviceclass=DEVICE_CLASS check only in a specific device class Example: tsmmonitor unav tsmmonitor unav 2 4 tsmmonitor unav -d=LTOCLASS 2 4 --------------------------------------------------------------------- check a storage pool utilization The default numbers are: warning..: 80 critical.: 95 Usage..: tsmmonitor stgpool <storage_pool_name> [warning] [critical] Example: tsmmonitor stgpool DISK_POOL tsmmonitor stgpool DISK_POOL 50 75 --------------------------------------------------------------------- check for volumes with write error and/or read error Default, search for volumes with write or read errors The default numbers are: warning..: 1 critical.: 5 Usage..: tsmmonitor volerr [options] [warning] [critical] -r, --read test only read errors -w, --write test only write errors -l, --library=LIBRARY_NAME check only volumes in the library Example: tsmmonitor volerr tsmmonitor volerr -r tsmmonitor volerr 3 5 tsmmonitor volerr -l=LTOLIB tsmmonitor volerr -l=LTOLIB 3 5 tsmmonitor volerr -w -l=LTOLIB 3 5 --------------------------------------------------------------------- check for volumes with percentage reclaimable space greater than The default numbers are: warning..: 5 critical.: 20 Usage..: tsmmonitor volreclaim [options] [warning] [critical] -r, --reclaim=PCT_RECLAIM pct reclaimable space (default: 80 pct) -l, --library=LIBRARY_NAME check only volumes in the library -s, --stgpool=STGPOOL_NAME check only volumes in the storage pool -V, --verbose list the volumes found Example: tsmmonitor volreclaim tsmmonitor volreclaim -r tsmmonitor volreclaim 3 5 tsmmonitor volreclaim -l=LTOLIB tsmmonitor volreclaim -l=LTOLIB 3 5 tsmmonitor volreclaim -w -l=LTOLIB 3 5 --------------------------------------------------------------------- check how many tapes are in the library The default numbers are: warning..: 90 critical.: 86 Usage..: tsmmonitor tapeslib [options] [warning] [critical] -l, --library=LIBRARY_NAME check only volumes in the library Example: tsmmonitor tapeslib tsmmonitor tapeslib 120 115 tsmmonitor tapeslib -l=LTOLIB3 120 115 tsmmonitor tapeslib -l=LTOLIB3 --------------------------------------------------------------------- check how many tapes have a specific owner The default numbers are: warning..: 2 critical.: 3 Usage..: tsmmonitor tapesown <owner> [warning] [critical] Example: tsmmonitor tapesown tsmsrv01 tsmmonitor tapesown tsmsrv01 4 5 --------------------------------------------------------------------- check how many volumes are in a specific storage pool The default numbers are: warning..: 40 critical.: 50 Usage..: tsmmonitor tapesstgpool <storage_pool_name> [warning] [critical] Example: tsmmonitor tapesstgpool DAILY tsmmonitor tapesstgpool DAILY 30 45 --------------------------------------------------------------------- check how many tsm db backup there are in the last N hours (default is 25h) The default numbers are: warning..: 0 critical.: 0 Usage..: tsmmonitor dbbkp [options] [warning] [critical] -t, --type=I,F,S Specifies the type of backup to look for Incremental,Full,dbSnapshot (default is full only) -H, --hours=NUM_HOURS how many hours ago to search for db backup Example: tsmmonitor dbbkp tsmmonitor dbbkp 2 1 tsmmonitor dbbkp -H=12 tsmmonitor dbbkp -H=12 2 1 tsmmonitor dbbkp -H=12 -t=S tsmmonitor dbbkp -H=12 -t=F,S 2 1 --------------------------------------------------------------------- check number of nodes sessions The default numbers are: warning..: 15 critical.: 20 Usage..: tsmmonitor numsess [options] [warning] [critical] [session_state] -s, --state=SESSION_STATE Count only nodes sessions with a specifc state Example: tsmmonitor numsess tsmmonitor numsess 100 150 tsmmonitor numsess -s=MediaW 5 10 tsmmonitor numsess -s=MediaW --------------------------------------------------------------------- check number of nodes The default numbers are: warning..: 80 critical.: 90 Usage..: tsmmonitor numnodes [options] [warning] [critical] -d, --domain=DOMAIN Count nodes only in the DOMAIN Example: tsmmonitor numnodes tsmmonitor numnodes 20 30 tsmmonitor numnodes -d=SAP 20 30 tsmmonitor numnodes -d=SAP --------------------------------------------------------------------- check number of nodes locked The default numbers are: warning..: 1 critical.: 4 Usage..: tsmmonitor nodeslocked [options] [warning] [critical] -d, --domain=DOMAIN Count nodes only in the DOMAIN Example: tsmmonitor nodeslocked tsmmonitor nodeslocked 2 4 tsmmonitor nodeslocked -d=SAP 2 4 tsmmonitor nodeslocked -d=SAP --------------------------------------------------------------------- check number of disk volumes without readwrite access The default numbers are: warning..: 1 critical.: 4 Usage..: tsmmonitor diskvol [warning] [critical] Example: tsmmonitor diskvol tsmmonitor diskvol 2 3 --------------------------------------------------------------------- check number of database volumes not synchronized (copy status) The default numbers are: warning..: 1 critical.: 2 Usage..: tsmmonitor dbvol [warning] [critical] Example: tsmmonitor dbvol tsmmonitor dbvol 2 3 --------------------------------------------------------------------- check number of log volumes not synchronized (copy status) The default numbers are: warning..: 1 critical.: 2 Usage..: tsmmonitor logvol [warning] [critical] Example: tsmmonitor logvol tsmmonitor logvol 2 3 --------------------------------------------------------------------- Search for a specific ANR in the last N hours (default is 1h) The default numbers are: warning..: 1 critical.: 3 Usage..: tsmmonitor searchanr [options] <ANR> [warning] [critical] -H, --hours=NUM_HOURS_AGO how many hours ago to search for Example: tsmmonitor searchanr ANR8446W tsmmonitor searchanr ANR8446W 2 4 tsmmonitor searchanr -H=12 ANR8446W --------------------------------------------------------------------- check number of DRM volumes The default values are: warning..: 1 critical.: 4 Usage..: tsmmonitor drmvol [options] [warning] [critical] -l, --library=LIBRARY_NAME search volumes only in the library -s, --state=DRM_STATE DRM state of volumes (default: MOUNTABLE) VAULT,VAULTRETRIEVE,COURIERRETRIEVE -i, --invert Invert the sense of matching, to select non-matching volumes Example: tsmmonitor drmvol tsmmonitor drmvol -i -l=3584LIB # DRM volumes with state different from MOUNTABLE in library tsmmonitor drmvol -s=COURIERRETRIEVE tsmmonitor drmvol -s=VAULT -l=3584LIB 1 8 tsmmonitor drmvol 2 6 --------------------------------------------------------------------- check the number of schedules not completed (only today's schedules) The default numbers are: warning..: 1 critical.: 3 Usage..: tsmmonitor sched [options] [warning] [critical] -a, --admin only administrative schedules. -s, --schedule=SCHEDULE_NAME only a specific schedule Example: tsmmonitor sched tsmmonitor sched -a tsmmonitor sched -s=DAILY_BKP 4 15 --------------------------------------------------------------------- check server license compliance Usage..: tsmmonitor lic Example: tsmmonitor lic --------------------------------------------------------------------- check false private tapes The default percentages are: warning..: 1 critical.: 3 Usage..: tsmmonitor falseprivate [warning] [critical] Example: tsmmonitor falseprivate tsmmonitor falseprivate 3 5 ---------------------------------------------------------------------