Orb Data is currently helping customers review their options for migration to AIOps solutions. Often this is the start of a journey and customers wish to accelerate adoption of some of the AIOps features with the existing infrastructure.
Amongst the many features of IBM Watson AIOps, two that could significantly benefit an existing Netcool Operations Insight (NOI) solution are Cloud Native Event Analytics (CNEA) and ChatOps integration. NOI can exploit both these features.
What is Cloud Native Event Analytics?
IBM Watson AIOps CNEA continually analyses events to identify seasonal trends, correlation and probable cause.
Seasonal alerts occur consistently within a regular time window, for example the same hour of the day, day of the week, day of the month, day of the week at a specific hour or day of the month at a specific hour. This may be indicative of a problem that requires resolution or at minimum informs the SRE/DevOPs/ITOps of the history so that previous actions can be reviewed and future alerts suppressed.
Temporal and topological correlation reduces noise by grouping alerts into Incidents.
Temporal correlation identifies relationships based on historic co-occurrence of alerts. For example an alert for “Slow Check-out transactions” may regularly occur in the same timeframe as an alert for “High number of messages in queue”.
Topological correlation groups alerts into an Incident based on relationships between resources. For example, application topology may identify relationships between alerts from distinct components.
Grouped alerts are presented as an Incident to the SRE/DevOPs/ITOps, reducing manual filtering/correlation that would historically have been required and so reducing the issue resolution time.
Probable cause alerts further enhance the Incident by identifying the top-three likely causes of an Incident, ranked in order of probability. The probable cause is identified based on the topology associated to the incident, and classification of the alerts.
Event Analytics for Netcool Operations Insight (NOI) can be enabled in one of two ways, either through an on-premise Netcool/Impact based analytics engine or using a hybrid-NOI solution to integrate the CNEA with the on-premise NOI infrastructure. There are significant benefits to the latter, predominantly the maturity of the algorithms and the fact event data is streamed to the analytics engine, resulting in continuous re-training. The Netcool/Impact solution is dependent on periodic analysis and re-training from the historical event database. A hybrid-NOI solution can enhance the on-premise solution further with topology visualisations and runbook automations, helping the support team understand the state of application and resolving issues.
What is ChatOps?
Chat Operations (ChatOps) is a collaboration model that uses standard chat clients, for example Slack or Microsoft Teams, to aid software development and operations. ChatOps is not just about connecting people, but also tools, processes and automations. Most AIOps solutions can escalate issues via ChatOps channels, informing the people that need-to-know using their preferred tools.
Escalating Netcool alerts via ChatOps streamlines the operational processes, ensuring the support teams have early visibility of issues and incidents. The sooner the support teams are aware of issues and the more information they have available, the sooner they can resolve those issues and so minimise the disruption to customers.
IBM Watson AIOps integrates to both Slack and Microsoft Teams, enabling incidents to be escalated via ChatOps. The message posted includes a rich set of data pertaining to an incident, including grouped alerts, probable cause, topological information, blast radius and suggested resolution options with drill-down options to aid further investigations.
Although ChatOps integration is not delivered by a hybrid-NOI set-up, it is relatively straight forward to engineer a solution for posting messages to ChatOps channels for Netcool alerts. Below described the basics of a proof-of-concept integration to Microsoft Teams.
NOI ChatOps Integration POC
Although NOI and a NOI-hybrid solution do not support a ChatOps integration “out-of-the-box”, a solution can be developed to enable automated/semi-automated posts to a chat tool. The proof-of-concept solution enables messages to be posted to Microsoft Teams Channels from a Netcool alert or incidents. The foundations of this solution are generic, only the last step was Microsoft Teams specific, so it is adaptable to other chat products.
The basics of the integration were:
- New alert columns and conversions in the ObjectServer to identify alerts for a ChatOps-post and track the processing-state.
- Web GUI tool to flag alerts to post to Microsoft Teams and identify the Channel
- An Impact policy to format and post the message using the Microsoft Teams Incoming Webhook API
The ChatOps solution was POC, so there are a number of improvements that would likely be made for a production environment, for example:
- Automated message-posts for a subset of alerts identified through event enrichment, so eliminating delays waiting for the Operators to escalate the alert.
- Posting of data from Netcool Incidents generated by CNEA including probable cause details.
- Throttling of the rate of message posts based on service limitations.
Note, also, that other RESTful API integrations are available to Microsoft Teams that may be more appropriate for a production solution.
Netcool/Impact is not essential for such an integration. Netcool/Impact does make the integration to a RESTful API very simple, but an ObjectServer External Procedure could be used to replace that communication. Overall the POC demonstrated the power for NOI as a development platform and how such integrations are straight-forward to develop.
If you have any questions on this Blog, AIOps or NOI development do contact Orb Data at firstname.lastname@example.org.