Introduction

Tsmmonitor is a shell script that performs some TSM (Tivoli Storage Manager) checks. The main idea is to help monitor TSM Servers.

Some nice features:

This script intends to be UNIX compliance. It has been tested successfully under many Linux and AIX (4.3, 5.2 and 5.3). If you have any problem, please let me know.

Suggestions and bug reports are very welcome. Contact me at: <thobias (a) thobias org>

Changelog

Version 1.0, released on 15/06/2007 (DD/MM/YYYY)

Download

Just download this file: tsmmonitor

Using it is quite simple. It doesn't require installation. All you have to do is set three options in the source code. Edit the script and look for this section:

############### tsm server information
#
# dsmadmc command path
DSMADMC='/usr/bin/dsmadmc'

# tsm user to connect to the tsm server and perform the checks
USER=''

# tsm user password
PASS=''

Only these variables must be changed. If everything is fine, you can now test the script:

tsmmonitor --help
tsmmonitor db

Take a look at the source code for more customizations.

For nerds, source code highlighting.

Checks

The complete list of the available checks.

No Check Description
1 help show all checks help
2 db check tsm database utilization
3 dbbkp check how many tsm db backup there are in the last 24 hours
4 dbfrag check tsm database fragmentation
5 dbvol check number of database volumes not synchronized (copy status)
6 diskvol check number of disk volumes without readwrite access
7 drive check number of drives not online
8 lic check server license compliance
9 log check tsm recovery log utilization
10 logvol check number of log volumes not synchronized (copy status)
11 numnodes check number of nodes
12 numsess check number of nodes sessions
13 path check number of paths not online
14 req check number of pending requests
15 scratch check number of scratch tapes
16 stgpool check a storage pool utilization
17 tapeslib check how many tapes are in the library
18 tapesown check how many tapes have a specific owner
19 tapesstgpool check how many volumes are in a specific storage pool
20 unav check number of unavailable volumes
21 volerr check number of volumes with error (error_state)

TSMMonitor in Action

Some samples to show tsmmonitor in action:

prompt> tsmmonitor -h
Usage: tsmmonitor [options] [check] [options_check]

  -s, -servername     specify tsm servername
  -m, -mail           mail addresses separated by comma
  -h, --help          print this help information and exit
  -V, --version       print program version and exit

These are global options. They can be used in all checks.

The following checks are available:

help, db, log, scratch, drive, path, dbfrag, unav, req, stgpool, volerr, tapeslib,
tapesown, tapesstgpool, dbbkp, numsess, numnodes, diskvol, dbvol, logvol, lic

Try 'tsmmonitor <check> --help' for more information.

Example:
  tsmmonitor db --help
  tsmmonitor db
  tsmmonitor -m=user1@somewhere.com,user2@somewhere.com db
  tsmmonitor -servername=tsmsrv01 db
  tsmmonitor -servername=tsmsrv02 db 85 95

Showing the help of db check:

prompt> tsmmonitor db -h

check tsm database utilization

The default percentages are:
   warning..: 85
   critical.: 90

Usage..: tsmmonitor db [warning] [critical]
Example: tsmmonitor db
         tsmmonitor db 80 95

Checking the TSM database utilization:

prompt> tsmmonitor db
db: database utilization 79%, Ok
prompt> echo $?
0

Checking the TSM database utilization specifying different percentage for warning and critical status:

prompt> tsmmonitor db 70 85
db: database utilization 79%, Warning
prompt> echo $?
1

prompt> tsmmonitor db 60 75
db: database utilization 79%, Critical
prompt> echo $?
2

Checking the number of volumes with error:

prompt> tsmmonitor volerr   
volerr: number of volumes with error 2, Critical

Checking the number of volumes unavailable:

prompt> tsmmonitor unav
unav: number of unavailable volumes 1, Warning.

Checking the number of volumes unavailable with verbose option:

prompt> tsmmonitor unav -v 
unav: number of unavailable volumes 1, Warning. Volumes: R00043L3

Checking the number of drives not online:

prompt> tsmmonitor drive
drive: number of drives not online 0, OK

Nagios Plugin

Tsmmonitor can be used transparently as a nagios plugin. Nagios plugins are based on script return code, ie, 0 - normal, 1 - warning, 2 - critical and 3 - unknown. These are the same return codes used by tsmmonitor.

Alert Notification

You can use the alert notification to receive an e-mail when the status changes. This feature is disabled by default. To turn on, you have to change the following options in the source code:

############### send notification
#
# at every time that a check changes the status,
# an alert (notification) will be sent by mail. default is off
SEND_ALERT=0   # 1 = on and 0 = off

# e-mails which will receive the notifications. mail addresses are separated
# by blank space. ex: MAILTO='xxx@yyy.zzz aaa@bbb.zzz ppp@qqq.lll'
MAILTO=''

# temp directory where tsmmonitor will record check status.
# it is necessary to send mail when the check status changes
TEMPDIR='/tmp'

Remeber you can specify different mail addresses in the command line:

prompt> tsmmonitor -m=user2@somewhere.com db

You can use the cron to execute scheduled checks and receive the tsmmonitor alerts by e-mail.

prompt> crontab -l
*/15 * * * *    /PATH/tsmmonitor db      > /dev/null
*/10 * * * *    /PATH/tsmmonitor log     > /dev/null
*/10 * * * *    /PATH/tsmmonitor drive   > /dev/null
*/10 * * * *    /PATH/tsmmonitor path    > /dev/null
*/15 * * * *    /PATH/tsmmonitor scratch > /dev/null

Help

HTML version of the command tsmmonitor help.

---------------------------------------------------------------------
show all checks help

Usage..: tsmmonitor help
Example: tsmmonitor help
---------------------------------------------------------------------
check tsm database utilization

The default percentages are:
   warning..: 85
   critical.: 90

Usage..: tsmmonitor db [warning] [critical]
Example: tsmmonitor db
         tsmmonitor db 80 95
---------------------------------------------------------------------
check tsm recovery log utilization

The default percentages are:
   warning..: 60
   critical.: 80

Usage..: tsmmonitor log [warning] [critical]
Example: tsmmonitor log
         tsmmonitor log 80 95
---------------------------------------------------------------------
check scratch tapes minimum number

The default numbers are:
   warning..: 10
   critical.: 6

Usage..: tsmmonitor scratch [warning] [critical] [library_name]
Example: tsmmonitor scratch
         tsmmonitor scratch 8 4
         tsmmonitor scratch 8 4 LTOLIB3
         tsmmonitor scratch LTOLIB3
---------------------------------------------------------------------
check number of drives not online

The default numbers are:
   warning..: 1
   critical.: 2

Usage..: tsmmonitor drive [warning] [critical] [library_name]
Example: tsmmonitor drive
         tsmmonitor drive 2 3
         tsmmonitor drive 1 2 LTOLIB3
         tsmmonitor drive LTOLIB3
---------------------------------------------------------------------
check number of paths not online

The default numbers are:
   warning..: 1
   critical.: 2

Usage..: tsmmonitor path [warning] [critical]
Example: tsmmonitor path
         tsmmonitor path 2 4
---------------------------------------------------------------------
check tsm database fragmentation

The default numbers are:
   warning..: 60
   critical.: 80

Usage..: tsmmonitor dbfrag [warning] [critical]
Example: tsmmonitor dbfrag
         tsmmonitor dbfrag 50 75
---------------------------------------------------------------------
check number of unavailable volumes

The default numbers are:
   warning..: 1
   critical.: 2

Usage..: tsmmonitor unav [options] [warning] [critical] [device_class]
   -v,   show unavailable volumes
Example: tsmmonitor unav -v
         tsmmonitor unav 2 4
         tsmmonitor unav 2 4 LTOCLASS
---------------------------------------------------------------------
check number of pending requests (query request)

The default numbers are:
   warning..: 1
   critical.: 2

Usage..: tsmmonitor req [warning] [critical]
Example: tsmmonitor req
         tsmmonitor req 2 3
---------------------------------------------------------------------
check a storage pool utilization

The default numbers are:
   warning..: 80
   critical.: 95

Usage..: tsmmonitor stgpool <storage_pool_name> [warning] [critical]
Example: tsmmonitor stgpool DISK_POOL
         tsmmonitor stgpool DISK_POOL 50 75
---------------------------------------------------------------------
check for volumes with error (error_state)

The default numbers are:
   warning..: 1
   critical.: 2

Usage..: tsmmonitor volerr [options] [warning] [critical] [device_class]
   -v,   show volumes with error
Example: tsmmonitor volerr
         tsmmonitor volerr -v 3 5
         tsmmonitor volerr 3 5 LTOCLASS
---------------------------------------------------------------------
check how many tapes are in the library

The default numbers are:
   warning..: 90
   critical.: 86

Usage..: tsmmonitor tapeslib [warning] [critical] [library_name]
Example: tsmmonitor tapeslib
         tsmmonitor tapeslib 120 115
         tsmmonitor tapeslib 120 115 LTOLIB3
         tsmmonitor tapeslib LTOLIB3
---------------------------------------------------------------------
check how many tapes have a specific owner

The default numbers are:
   warning..: 2
   critical.: 3

Usage..: tsmmonitor tapesown <owner> [warning] [critical]
Example: tsmmonitor tapesown tsmsrv01
         tsmmonitor tapesown tsmsrv01 4 5
---------------------------------------------------------------------
check how many volumes are in a specific storage pool

The default numbers are:
   warning..: 40
   critical.: 50

Usage..: tsmmonitor tapesstgpool <storage_pool_name> [warning] [critical]
Example: tsmmonitor tapesstgpool DAILY
         tsmmonitor tapesstgpool DAILY 30 45
---------------------------------------------------------------------
check how many tsm db backup there are in the last 24 hours

The default numbers are:
   warning..: 1
   critical.: 0

Usage..: tsmmonitor dbbkp [options] [warning] [critical]
   -v,   show some informations about database backup
Example: tsmmonitor dbbkp
         tsmmonitor dbbkp -v
         tsmmonitor dbbkp 2 1
---------------------------------------------------------------------
check number of nodes sessions

The default numbers are:
   warning..: 15
   critical.: 20

Usage..: tsmmonitor numsess [warning] [critical] [session_state]
Example: tsmmonitor numsess
         tsmmonitor numsess 20 30
         tsmmonitor numsess 20 30 MediaW
         tsmmonitor numsess Run
---------------------------------------------------------------------
check number of nodes

The default numbers are:
   warning..: 80
   critical.: 90

Usage..: tsmmonitor numnodes [warning] [critical] [domain]
Example: tsmmonitor numnodes
         tsmmonitor numnodes 20 30
         tsmmonitor numnodes 20 30 SAP
         tsmmonitor numnodes SAP
---------------------------------------------------------------------
check number of disk volumes without readwrite access

The default numbers are:
   warning..: 1
   critical.: 2

Usage..: tsmmonitor diskvol [options] [warning] [critical]
   -v,   show volumes without readwrite access
Example: tsmmonitor diskvol
         tsmmonitor diskvol -v
         tsmmonitor diskvol 2 3
         tsmmonitor diskvol -v 2 3
---------------------------------------------------------------------
check number of database volumes not synchronized (copy status)

The default numbers are:
   warning..: 1
   critical.: 2

Usage..: tsmmonitor dbvol [warning] [critical]
Example: tsmmonitor dbvol
         tsmmonitor dbvol 2 3
---------------------------------------------------------------------
check number of log volumes not synchronized (copy status)

The default numbers are:
   warning..: 1
   critical.: 2

Usage..: tsmmonitor logvol [warning] [critical]
Example: tsmmonitor logvol
         tsmmonitor logvol 2 3
---------------------------------------------------------------------
check server license compliance

Usage..: tsmmonitor lic
Example: tsmmonitor lic
---------------------------------------------------------------------