Tsmmonitor is a shell script that performs some TSM (Tivoli Storage Manager) checks. The main idea is to help monitor TSM Servers.
Some nice features:
This script intends to be UNIX compliance. It has been tested successfully under many Linux and AIX (4.3, 5.2 and 5.3). If you have any problem, please let me know.
Suggestions and bug reports are very welcome. Contact me at: <thobias (a) thobias org>
Version 1.0, released on 15/06/2007 (DD/MM/YYYY)
Just download this file: tsmmonitor
Using it is quite simple. It doesn't require installation. All you have to do is set three options in the source code. Edit the script and look for this section:
############### tsm server information # # dsmadmc command path DSMADMC='/usr/bin/dsmadmc' # tsm user to connect to the tsm server and perform the checks USER='' # tsm user password PASS=''
Only these variables must be changed. If everything is fine, you can now test the script:
tsmmonitor --help
tsmmonitor db
Take a look at the source code for more customizations.
For nerds, source code highlighting.
The complete list of the available checks.
| No | Check | Description |
|---|---|---|
| 1 | help | show all checks help |
| 2 | db | check tsm database utilization |
| 3 | dbbkp | check how many tsm db backup there are in the last 24 hours |
| 4 | dbfrag | check tsm database fragmentation |
| 5 | dbvol | check number of database volumes not synchronized (copy status) |
| 6 | diskvol | check number of disk volumes without readwrite access |
| 7 | drive | check number of drives not online |
| 8 | lic | check server license compliance |
| 9 | log | check tsm recovery log utilization |
| 10 | logvol | check number of log volumes not synchronized (copy status) |
| 11 | numnodes | check number of nodes |
| 12 | numsess | check number of nodes sessions |
| 13 | path | check number of paths not online |
| 14 | req | check number of pending requests |
| 15 | scratch | check number of scratch tapes |
| 16 | stgpool | check a storage pool utilization |
| 17 | tapeslib | check how many tapes are in the library |
| 18 | tapesown | check how many tapes have a specific owner |
| 19 | tapesstgpool | check how many volumes are in a specific storage pool |
| 20 | unav | check number of unavailable volumes |
| 21 | volerr | check number of volumes with error (error_state) |
Some samples to show tsmmonitor in action:
prompt> tsmmonitor -h Usage: tsmmonitor [options] [check] [options_check] -s, -servername specify tsm servername -m, -mail mail addresses separated by comma -h, --help print this help information and exit -V, --version print program version and exit These are global options. They can be used in all checks. The following checks are available: help, db, log, scratch, drive, path, dbfrag, unav, req, stgpool, volerr, tapeslib, tapesown, tapesstgpool, dbbkp, numsess, numnodes, diskvol, dbvol, logvol, lic Try 'tsmmonitor <check> --help' for more information. Example: tsmmonitor db --help tsmmonitor db tsmmonitor -m=user1@somewhere.com,user2@somewhere.com db tsmmonitor -servername=tsmsrv01 db tsmmonitor -servername=tsmsrv02 db 85 95
Showing the help of db check:
prompt> tsmmonitor db -h
check tsm database utilization
The default percentages are:
warning..: 85
critical.: 90
Usage..: tsmmonitor db [warning] [critical]
Example: tsmmonitor db
tsmmonitor db 80 95
Checking the TSM database utilization:
prompt> tsmmonitor db db: database utilization 79%, Ok prompt> echo $? 0
Checking the TSM database utilization specifying different percentage for warning and critical status:
prompt> tsmmonitor db 70 85 db: database utilization 79%, Warning prompt> echo $? 1
prompt> tsmmonitor db 60 75 db: database utilization 79%, Critical prompt> echo $? 2
Checking the number of volumes with error:
prompt> tsmmonitor volerr volerr: number of volumes with error 2, Critical
Checking the number of volumes unavailable:
prompt> tsmmonitor unav unav: number of unavailable volumes 1, Warning.
Checking the number of volumes unavailable with verbose option:
prompt> tsmmonitor unav -v unav: number of unavailable volumes 1, Warning. Volumes: R00043L3
Checking the number of drives not online:
prompt> tsmmonitor drive drive: number of drives not online 0, OK
Tsmmonitor can be used transparently as a nagios plugin. Nagios plugins are based on script return code, ie, 0 - normal, 1 - warning, 2 - critical and 3 - unknown. These are the same return codes used by tsmmonitor.
You can use the alert notification to receive an e-mail when the status changes. This feature is disabled by default. To turn on, you have to change the following options in the source code:
############### send notification # # at every time that a check changes the status, # an alert (notification) will be sent by mail. default is off SEND_ALERT=0 # 1 = on and 0 = off # e-mails which will receive the notifications. mail addresses are separated # by blank space. ex: MAILTO='xxx@yyy.zzz aaa@bbb.zzz ppp@qqq.lll' MAILTO='' # temp directory where tsmmonitor will record check status. # it is necessary to send mail when the check status changes TEMPDIR='/tmp'
Remeber you can specify different mail addresses in the command line:
prompt> tsmmonitor -m=user2@somewhere.com db
You can use the cron to execute scheduled checks and receive the tsmmonitor alerts by e-mail.
prompt> crontab -l */15 * * * * /PATH/tsmmonitor db > /dev/null */10 * * * * /PATH/tsmmonitor log > /dev/null */10 * * * * /PATH/tsmmonitor drive > /dev/null */10 * * * * /PATH/tsmmonitor path > /dev/null */15 * * * * /PATH/tsmmonitor scratch > /dev/null
HTML version of the command tsmmonitor help.
---------------------------------------------------------------------
show all checks help
Usage..: tsmmonitor help
Example: tsmmonitor help
---------------------------------------------------------------------
check tsm database utilization
The default percentages are:
warning..: 85
critical.: 90
Usage..: tsmmonitor db [warning] [critical]
Example: tsmmonitor db
tsmmonitor db 80 95
---------------------------------------------------------------------
check tsm recovery log utilization
The default percentages are:
warning..: 60
critical.: 80
Usage..: tsmmonitor log [warning] [critical]
Example: tsmmonitor log
tsmmonitor log 80 95
---------------------------------------------------------------------
check scratch tapes minimum number
The default numbers are:
warning..: 10
critical.: 6
Usage..: tsmmonitor scratch [warning] [critical] [library_name]
Example: tsmmonitor scratch
tsmmonitor scratch 8 4
tsmmonitor scratch 8 4 LTOLIB3
tsmmonitor scratch LTOLIB3
---------------------------------------------------------------------
check number of drives not online
The default numbers are:
warning..: 1
critical.: 2
Usage..: tsmmonitor drive [warning] [critical] [library_name]
Example: tsmmonitor drive
tsmmonitor drive 2 3
tsmmonitor drive 1 2 LTOLIB3
tsmmonitor drive LTOLIB3
---------------------------------------------------------------------
check number of paths not online
The default numbers are:
warning..: 1
critical.: 2
Usage..: tsmmonitor path [warning] [critical]
Example: tsmmonitor path
tsmmonitor path 2 4
---------------------------------------------------------------------
check tsm database fragmentation
The default numbers are:
warning..: 60
critical.: 80
Usage..: tsmmonitor dbfrag [warning] [critical]
Example: tsmmonitor dbfrag
tsmmonitor dbfrag 50 75
---------------------------------------------------------------------
check number of unavailable volumes
The default numbers are:
warning..: 1
critical.: 2
Usage..: tsmmonitor unav [options] [warning] [critical] [device_class]
-v, show unavailable volumes
Example: tsmmonitor unav -v
tsmmonitor unav 2 4
tsmmonitor unav 2 4 LTOCLASS
---------------------------------------------------------------------
check number of pending requests (query request)
The default numbers are:
warning..: 1
critical.: 2
Usage..: tsmmonitor req [warning] [critical]
Example: tsmmonitor req
tsmmonitor req 2 3
---------------------------------------------------------------------
check a storage pool utilization
The default numbers are:
warning..: 80
critical.: 95
Usage..: tsmmonitor stgpool <storage_pool_name> [warning] [critical]
Example: tsmmonitor stgpool DISK_POOL
tsmmonitor stgpool DISK_POOL 50 75
---------------------------------------------------------------------
check for volumes with error (error_state)
The default numbers are:
warning..: 1
critical.: 2
Usage..: tsmmonitor volerr [options] [warning] [critical] [device_class]
-v, show volumes with error
Example: tsmmonitor volerr
tsmmonitor volerr -v 3 5
tsmmonitor volerr 3 5 LTOCLASS
---------------------------------------------------------------------
check how many tapes are in the library
The default numbers are:
warning..: 90
critical.: 86
Usage..: tsmmonitor tapeslib [warning] [critical] [library_name]
Example: tsmmonitor tapeslib
tsmmonitor tapeslib 120 115
tsmmonitor tapeslib 120 115 LTOLIB3
tsmmonitor tapeslib LTOLIB3
---------------------------------------------------------------------
check how many tapes have a specific owner
The default numbers are:
warning..: 2
critical.: 3
Usage..: tsmmonitor tapesown <owner> [warning] [critical]
Example: tsmmonitor tapesown tsmsrv01
tsmmonitor tapesown tsmsrv01 4 5
---------------------------------------------------------------------
check how many volumes are in a specific storage pool
The default numbers are:
warning..: 40
critical.: 50
Usage..: tsmmonitor tapesstgpool <storage_pool_name> [warning] [critical]
Example: tsmmonitor tapesstgpool DAILY
tsmmonitor tapesstgpool DAILY 30 45
---------------------------------------------------------------------
check how many tsm db backup there are in the last 24 hours
The default numbers are:
warning..: 1
critical.: 0
Usage..: tsmmonitor dbbkp [options] [warning] [critical]
-v, show some informations about database backup
Example: tsmmonitor dbbkp
tsmmonitor dbbkp -v
tsmmonitor dbbkp 2 1
---------------------------------------------------------------------
check number of nodes sessions
The default numbers are:
warning..: 15
critical.: 20
Usage..: tsmmonitor numsess [warning] [critical] [session_state]
Example: tsmmonitor numsess
tsmmonitor numsess 20 30
tsmmonitor numsess 20 30 MediaW
tsmmonitor numsess Run
---------------------------------------------------------------------
check number of nodes
The default numbers are:
warning..: 80
critical.: 90
Usage..: tsmmonitor numnodes [warning] [critical] [domain]
Example: tsmmonitor numnodes
tsmmonitor numnodes 20 30
tsmmonitor numnodes 20 30 SAP
tsmmonitor numnodes SAP
---------------------------------------------------------------------
check number of disk volumes without readwrite access
The default numbers are:
warning..: 1
critical.: 2
Usage..: tsmmonitor diskvol [options] [warning] [critical]
-v, show volumes without readwrite access
Example: tsmmonitor diskvol
tsmmonitor diskvol -v
tsmmonitor diskvol 2 3
tsmmonitor diskvol -v 2 3
---------------------------------------------------------------------
check number of database volumes not synchronized (copy status)
The default numbers are:
warning..: 1
critical.: 2
Usage..: tsmmonitor dbvol [warning] [critical]
Example: tsmmonitor dbvol
tsmmonitor dbvol 2 3
---------------------------------------------------------------------
check number of log volumes not synchronized (copy status)
The default numbers are:
warning..: 1
critical.: 2
Usage..: tsmmonitor logvol [warning] [critical]
Example: tsmmonitor logvol
tsmmonitor logvol 2 3
---------------------------------------------------------------------
check server license compliance
Usage..: tsmmonitor lic
Example: tsmmonitor lic
---------------------------------------------------------------------