tcl.syschk Verb: Access/TCL

tcl.syschk

Command tcl.syschk Verb: Access/TCL
Applicable release versions: AP 6.1
Category TCL (746)
Description checks a system running under Unix to detect 'abnormal' situations.
The "syschk" utility starts a phantom process which periodically checks the system is behaving 'normally'. If a system parameter goes beyond a threshold, defined by the System Administrator, a message is sent to one or more users, and, optionally, an entry is put in the "errors" file. All elements are optional. The following elements are controlled:

Unix Swap: When "syschk" is started, it always ensures that the Unix swap (or paging) space is at least equal to twice the physical memory. Then it checks periodically the swap usage does not go beyond a predefined level (90% of the total available space by default).

Total System CPU: The percentage of the CPU spent in System mode (Kernel, drivers, etc...) must stay below a predetermined level (25% by default).

Runaway Processes: Each time a sample is taken, the CPU time of the active processes is controlled. If a process has consumed more than a predefined percentage of the sampling period, a warning is issued. For example, if the sampling period is 10 minutes (600 seconds), and if the process consumed more than 5% of this time (30 seconds, which is an enormous amount of CPU), this process is probably in an abnormal tight CPU loop. "syschk" displays the Unix status ("ps") and the result of a "where", if it is a Pick process. If a process exceeds the limit more than three times in a row, the reporting of the error stops. This is to ensure that the System Administrator will not receive constant messages for a process which is running a large report, for example. If the process is still running after nine samples, then the reporting restarts three more times, and the reporting cycle restarts.

Unix File System Usage: The state of a predetermined list of Unix file systems is controlled, to make sure they do not get full (used over 90%). This is to prevent Unix crashes due to the filling up of critical file systems. By default, only '/' is controlled.

Overflow Usage: When the used Pick overflow exceeds a predetermined percentage of the total overflow space, a message is generated. The default level is 90%. Overflow is reported only once a day, at the first sample taken after noon.

Basic Usage: This monitors the usage of the FlashBASIC "basic" area. When the used "basic" space exceeds a predetermined percentage of the total "basic" space, a message is generated. The default level is 90%. To examine the "basic" space in more detail, use the "shpstat" command.


This command must be run on the "dm" account. Only one instance of a phantom process running "syschk" is supported at any given time.

The form "syschk start" starts the syschk command running as a phantom with the same argument as the last time it was started. If necessary, the process is stopped before it is restarted. If the process was never started, a set of defaults is applied.

The form "syschk edit" enters the Update processor to allow editing the arguments used by the "start" command.

The form "syschk now" instructs the "syschk" phantom process to take a sample immediately, and send the anomaly messages, if any, to the the terminal requesting the sample, instead of the normal notify mechanism.

The form "syschk stop" stops the syschk command running as a phantom.

The form "syschk" without any argument, simply reports whether syschk is currently running and displays when it was started, the parameters and some of the current system parameter values.

With one or more arguments, "syschk" creates a phantom process and returns immediately to TCL. Arguments can be specified in any order.

"keyword{=value}" specifies which system parameters to control:

sampling=[ sec | hh:mm:ss ]
Specifies the sampling delay. The delay can be expressed either in seconds of in hh:mm:ss format. If "sampling" is not specified, the default is 30 minutes.

notify=[ user{,..},!n{,..}, /dev/ ttyXX{,...} ,* | OFF ]
Specifies the list of Pick users, of Pick port number or Unix devices to notify in case of abnormal situation. The users can be specified as a list of explicit Pick user ids, Pick port numbers, in decimal, prefixed by an exclamation mark (!), Unix devices, prefixed by a slash (/), or any combination. If '*' is used, all users logged on the system are notified. If a period ('.') is used, it is replaced by the tty name. If 'OFF' is specified, notification is disabled. If "notify" is not specified, or if the specified users are not logged on at the time the anomaly occurs, a message is sent to "dm" or "sysprog". If a Unix device is specified, it must exist and be writable when "syschk" is started.

syscpu{= percentage {%}}
Specifies the maximum percentage of total CPU usage Unix is allowed to spend in System mode. If 'percentage' is not specified, the default is 25%.

proccpu{= percentage {%}}
Specifies the maximum percentage of CPU a process is allowed to take. If 'percentage' is not specified, the default is 25%. This trigger point may be a little difficult to evaluate. For example, on a system with only one active user running a Pick/BASIC cpu intensive program, the process will, naturally, takes 100% of the CPU, since there is no other running process. To avoid false alarms, select a sampling period large enough. It is probably unusual to have a process doing 100% of CPU for 15 minutes.

swapusg{= percentage {%}}
Specify the acceptable swap usage at any given time. 'percentage' is the amount of swap actually used. If 'percentage' is not specified, a swap usage above 90% of the total swap will be considered abnormal.

ovfusg{= percentage {%}}
Specify the acceptable overflow usage at any given time. 'percentage' is the amount of overflow actually used. If 'percentage' is not specified, an overflow usage above 90% of the total Pick space will be considered abnormal. Errors are reported only once a day.

basicusg{= percentage {%}}
Specify the acceptable 'basic' usage at any given time. 'percentage' is the amount of 'basic' space actually used. If 'percentage' is not specified, a 'basic' usage above 90% of the total 'basic' space will be considered abnormal.

diskusg= filesystem{,filesystem,..}
Specify the list of Unix file system which should never get full (over 90%). The list should always include '/'. Depending on the system, '/usr', '/tmp' might have to be included. Some Unix system, will not be able to boot with a full '/'. If not specified, '/' is the only Unix file system to be checked.

log
Log messages in the "errors" file. This keyword is equivalent to the (L) option.

nolog
Do not log messages in the "errors" file. This keyword is equivalent to not having the (L) option. It supersedes the (L) option.

When started as a phantom, "syschk" runs for ever, until the system is shutdown. To stop it, use "syschk stop".
Syntax syschk keyword{=value} {...} {(options}
syschk edit
syschk now
syschk start
syschk stop
syschk
Options F Start syschk in foreground on the current process, as opposed to a phantom process. With this option, the only way to stop the process is to do a break / end, or a logoff.

L Log a short summary of the error messages to the "errors" file. The initial swap space control is always logged. Can be specified by using the "log" keyword.

Q Quiet. Suppresses some user messages ("started", "stopped")

V Verbose. This option can be used if "syschk" is run in foreground, instead of a phantom.
Example
syschk sampling=00:30:00 syscpu=10% 
         notify=/dev/tty0,bob,!0 
         swapusg=60% proccpu=10% log
  Start a phantom process to check the system every 30 minutes for a CPU system 
usage above 10%, a swap usage above 60% and a process runaway limit of 10%. In 
case of anomaly, a message is sent to the Unix terminal '/dev/tty0', 
the Pick user 'bob' if he is logged on, and to the line 0 whether it 
is logged on or not, and a short message is logged in the 'errors' 
file. The other parameters are left to their default values.

syschk
  Check whether syschk is running. This would display, for example:
    syschk is running on port 132
    Started on 03/11/94 at 08:20:21

    Current running parameters:
      Sampling period           00:30:00
      Notify list               /dev/tty0  bob  !0
      Maximum system CPU %      10
      Maximum CPU % per process 25
      Maximum % of swap         60
      Maximum % of overflow     90
      Unix file systems         /
      Log messages (0=no;1=yes) 1

    Current System Status:
      User CPU usage            3%
      System CPU usage         11%
      Waiting for IO           82%
      Idle CPU                  4%
      Total swap space         128 Mb
      Used swap space           76 Mb (59%)


syschk stop (q
  Stop the phantom running "syschk" as a background process, 
suppressing the message "stopped". This command could be included in 
the "user-shutdown" macro.

syschk start
  Restart the "syschk" phantom with the same parameters.

syschk edit
  Edit the syschk command line to change the arguments. Use the Update 
processor command to edit the command line.
Purpose
Related tcl.shpstat
tcl.buffers
perf
tcl.system-coldstart