Skip to main content

System Monitor Health State

System Monitor Health State

Based on notifications posted to the messages log (see Monitoring Log Files), including both system alerts generated directly by the InterSystems IRIS instance and alerts and warnings generated by System Monitor and its Health Monitor component, System Monitor maintains a single value summarizing overall system health in a register in shared memory.

At startup, the system health state is set based on the number of system (not System Monitor) alerts posted to the messages log during the startup process. Once System Monitor is running, the health state can be elevated by either system alerts or System Monitor alerts or warnings. Status is cleared to the next lower level when 30 minutes have elapsed since the last system alert or System Monitor alert or warning was posted. The following table shows how the system health state is determined.

System Monitor Health State
State Set at startup when ... Set following startup when ... Cleared to ...

GREEN (0)

no system alerts are posted during startup 30 minutes (if state was YELLOW) or 60 minutes (if state was RED) have elapsed since the last system alert or System Monitor alert or warning was posted n/a

YELLOW (1)

up to four system alerts are posted during startup state is GREEN and
  • one system alert is posted

    OR

  • one or more System Monitor alerts and/or warnings are posted, but not alerts sufficient to set RED, as below

GREEN when 30 minutes have elapsed since the last system alert or System Monitor alert or warning was posted

RED (2)

five or more system alerts are posted during startup
  • state is YELLOW and one system alert is posted

    OR

  • state is GREEN or YELLOW and during a 30 minute period, System Monitor alerts from at least five different sensors or three System Monitor alerts from a single sensor are posted

YELLOW when 30 minutes have elapsed since the last system alert or System Monitor alert or warning was posted
Note:

A fourth state, HUNG, can occur when global updates are blocked. Specifically, the following events change the state to HUNG:

  • The journal daemon is paused for more than 5 seconds or frozen (see Journal I/O Errors).

  • Any of switches 10, 11, 13, or 14 are set (see Using Switches).

  • The write daemon is stopped for any reason or sets the updates locked flag for more than 3 seconds.

  • The number of available global buffers (in the database cache) falls into the critical region and remains there for more than 5 seconds.

When the health state changes to HUNG, the reason is written to the messages log.

You can view the System Monitor health state using:

  • The View System Health option on the View System Data menu of ^%SYSMONMGR (which does not report HUNG).

  • The $SYSTEM.Monitor API, which lets you access the system status directly. Use $SYSTEM.Monitor.State()Opens in a new tab to return the system status; see also the SetState, Clear, Alert, GetAlerts, and ClearAlerts methods.

  • The iris list and iris qlist commands (which do not include health state on Windows).

Note:

When System Monitor is not running, the System Monitor health state is always GREEN.

FeedbackOpens in a new tab