Orb Data – The Chapel Grenville Court, Britwell Road, Burnham, Bucks SL1 8DF, UK.
Tel: 01628 550450 email: info@orb-data.com
The comforting green glow of a healthy service dashboard inspires dangerous complacency if the monitoring you think you have put in place is not really running at all.
This article is taken from Message Broker.
By Ben Fawcett
False assurances are a real danger in Systems Management. For example, we know that taking backups without a proven restore process is one of the classic pitfalls. There's a similar case with Availability and Performance Monitoring. Comprehensive plans for measuring, reporting and alerting on service metrics amount to nothing unless you know the monitoring is continuous and active. Indeed, the comforting green glow of a healthy service dashboard inspires dangerous complacency if the monitoring you think you have put in place is not really running at all.
Enterprise Level Software is complex by it's very nature. It is usually distributed across a heterogeneous environment, with plenty of exceptions and edge cases to accommodate. The sheer volume of coverage means that errors and faults become a statistical inevitability.
The first step is to implement a reliable and proven monitoring platform. Then to put some level of trust in the built-in assurances it offers. But for those of us who's job it is to certify the monitoring service to our business and our customers, we would like any extra reassurance we can get; particular anything that can provide real-time end-to-end verification.
This problem is nothing new, but the tools we have at our disposal have changed. In the past you couldn't work on a Tivoli Monitoring project without encountering some custom heartbeat mechanism or other, eventually heart-beating was built into agent infrastructure. For listing monitors we had command line utilities like wlseng or wdmlseng.
With ITM 6 we have a different architecture and a much richer GUI, but no simple replacements for the old command line utilities. In order to determine the running Situations we need to gather and process information from the ITM 6 environment using a number of SOAP queries.
There are a couple of solutions available that will automatically collect this data and perform Situation verification for you. The first is the ITM Super Tool. This is a deep-dive troubleshooting tool available from IBM via the ISM Library (NavCode: 1TW10TM6L). The second is the Self-Service Portal (SSP) available from Orb Data. The SSP also maintains a list of expected and authorised Situations on each agent which provides a further cross-reference check for the verification process.

Orb Data Self-Service Portal (SSP)
Approaching the problem starts by understanding the way monitors are defined and deployed, and also how and where monitors are evaluated.
A monitor in ITM 6 is defined in a Situation. This includes a formula to determine whether a set of data metrics at a particular time constitute an alert condition. The Situation is activated on relevant agents via a membership hierarchy of Situation Groups and Managed System Lists. But even then it will still only run if it is switched on globally and, for most cases, successfully started up locally at the agent.
So we have an initial checklist for Situation verification:
The situation distribution hierarchy is built on relationships between four types of resource in ITM:
Situation Groups were introduced in ITM 6.2.2 and provide the recommended way for organising situation distribution. To support legacy implementations IBM preserved existing relationships between Situations, MSLs and Agents. The complete set of possible relationships is shown below:
Any path in the diagram that traces a route from Situation to Agent is a valid way of setting up a distribution. There is probably too much flexibility in the way you can arrange things. In any single environment I would recommend choosing a single path and configuring everything that way. I would also recommend against direct Situation to Agent distributions and avoiding nested Situation Groups as they add unnecessary complexity.

ITM Situation Distribution Hierarchy
The Self-Service Portal (SSP) from Orb Data can be used to manage Situations distributions. It uses best practice Standards and an approval process to build a list of authorised Situations for each agent. It then activates the authorised Situations by automatically building the distribution hierarchy.
This means that SSP can provide an additional step to the Situation verification process. By referencing the list of expected Situations for an agent it can provide an extra check that the Situations running are the authorised ones and only the authorised ones.