Introducing ITM 6 Dynamic Thresholds
Dynamic Situation Thresholds were introduced to ITM with version 6.2.1. The feature simplifies the maintenance of situations where thresholds differ dependent on the resource instance, managed system and/or time, and can reduced the system overhead when compared to previous methods for implementing such logic.
For example, the expected CPU usage of Windows servers may differ between weekends and weekdays. To implement this in earlier versions of ITM 6 would require either multiple situations, 2 calendar based situations and two threshold situations with the calendars situations embedded or the use of two situations and a policy to stop/start the situations at the relevant times.
To implement the solution with ITM 6.2.1 requires a single situation. The base formula is configured to monitoring based on the weekday CPU usage threshold. Additionally, a threshold override is defined to modify the base threshold using the pre-defined “Weekend” calendar.
This basic example reduces the number of situations from 4 to 1. Hence reducing the required maintenance and processing cycles for the formula evaluation.
Another example of the usefulness of the new dynamic threshold feature is where different managed systems have subtly different threshold requirements for different instances of the same resource. For example, consider a situation to monitor the free space available on Windows logical disks. Invariable a single threshold does not apply to all the drives on all the managed systems. A situation with defined overrides can apply a default threshold to all drives, but, where appropriate apply a Managed System or Logical Drive specific threshold.
Situation overrides can either be managed centrally, through the TEPS Client, or locally on the agent using an XML file. The latter being the first step in off-loading the threshold maintenance from the ITM administrators to the system owners. This article will look at examples of centrally defined overrides.
Example 1: Calendar based Override Configuration
For the first example, where the required monitoring threshold for the CPU usage differs between weekdays and the weekend, the overrides is defined from the TEP Client Situations Editor. A standard situation is defined, using the attribute group Processor and the attribute % Processor Time to set a threshold of 90%, as demonstrated in the figure below:
The Override Formula Editor is accessed from the Situation Editor Distribution tab. The administrator must highlight one of the assigned managed systems or managed system groups before clicking the button Override Formula…. This is demonstrated in the figure below.
Note: The button will be disabled unless a managed system or managed system group is highlighted.
The override expression is simply defined selecting the weekend calendar from the Situation Override Schedule window, displayed when clicking the clock icon, and entering the new threshold, as shown in the figure below.
Note: The Green check mark over the clock indicates that a calendar schedule has been selected.
Example 2: Attribute based and Managed System Specific Override Configuration
The second example alters the threshold for a logical disk space monitor based on the logical disk name and managed system name. In a similar fashion to the first example a basic situation is defined, including the formula:
% Free < 10
However, the following override definition recognises that the Logical Drive “C:” on the Managed System “WIN01” is considered to be running normally when below this threshold, until the disk has less than 5% free space (possible a legacy system that nobody dares touch?!?). This definition is demonstrated in the figure below:
Note: This override has been defined by highlighting the Managed System “WIN01” in the navigtaor on the left-hand side of the Add Expression Overrides window.
Creating More Complex Overrides
The above examples demonstrate the basic functionality of the Dynamic Threshold feature, however, overrides can support much more complex logic.
For example, multiple override expression can be defined for a single managed system or managed system group (click the Add Expression button from the Add Expression Overrides” Window), and these can be based on a combination of calendar and attribute overrides. The figure below demonstrates a logical disk situation where calendar based overrides are defined for mulitple different logical disk instances
A Priority is assigned to each override, a numeric from 1 to 200. This is used to resolve conflicts where multiple overrides apply to the same managed system. The default priority for an override defined for a Managed System, 100, takes precedence over the default priority for a Managed System Group override, 200.
Finally, the default set of calendars, Prime, NonPrime, Weekdays, Weekend, can be augmented from the command-line, using the
addCalendarEntry. For example, to create a calendar entry pertinent to backups that run between 01:00 and 01:59, run the command:
tacmd addCalendarEntry -n "Daily_0100to0159" -h 1 -d "Daily from 0100 to 0159"
The definition can subsequently be displayed with the
tacmd viewCalendarEntry -n Daily_0100to0159
Name : Daily_0100to0159
Type : CRON
Description: Daily from 0100 to 0159
Data : * 1 * * *
Full information on the syntax of these options, and other calendar base optoins, is available in the document IBM Tivoli Monitoring Command Reference version 6.2.1.
If this feature had been available in previous versions of ITM, I’d have already utilised it on numerous occasions! Despite the restrictions, for example it cannot be used with situations analysing multiple attribute groups, it is a first step to greatly increasing the flexibility of the ITM situations, and increasing the solution performance.