A few weeks back I wrote a blog post regarding Tivoli Analytics for Service Performance. The test system that I set up to analyse data from an ITM Tivoli Data Warehouse implementation has now been running for over a month. With the initial learning phase completed, TASP has now started detecting metric anomalies and generating alerts.
When viewing the resulting alerts through Netcool/OMNIbus Web GUI, the alert tool “ServiceDiagnosis…” can be used to launch in context into the Service Diagnosis TIP portlet. This portlet consists of a chart showing the metrics in question and highlights the point at which the anomaly was detected. The portlet also provides some detailed and summary information explaining why the anomaly alert was generated.
The following screenshots show some of the anomalies that have been detected:
Taking the charts at face value, it certainly appears that the underlying algorithms employed by TASP work as intended and do identify metric values that are not consistent with previously observed behaviour.
I suspect the real challenge however comes when trying to interpret the significance and impact that these anomalies represent and how they should be presented to an Operations team. In isolation, an alert indicating that the %TotalProcessorTime metric for a server is outside of its baseline doesn’t hold a great deal of value unless a direct relationship with the actual or potential degradation of a service can be established.
I suppose one way of approaching this would be to focus on business activity monitoring or customer experience metric anomalies that are easily interpretable, have obvious context or are generally a direct indication of the availability or performance of a service. Even then it seems likely that a successful implementation of TASP will require the additional business processing logic capability of products such as Netcool/Impact or TBSM to truly drive out the value it can offer.