Automatically Fixing Issues using Netcool and Ansible

Netcool is a well-recognised and trusted event management solution, consuming events from all types of technologies, including cloud, network devices, servers and applications. The Netcool solution has always included the feature to enable an automated response to be executed from events, using process control, aiding the swift resolution of issues. IBM Runbook Automation (RBA) is now licenced with Netcool Operations Insight, enabling SREs, SMEs and operational staff to build and execute runbooks to help solve common operational problems.

Runbook Automation

RBA can automate procedures that do not require human interaction, thereby increasing the efficiency of IT operations processes and reducing the workload on the operations team. RBA can execute automations using IBM BigFix, Ansible Tower, HTTP or script.

Many Netcool users already have mature runbook automation or configuration management products, and in smaller environments are not running RBA. So how can Netcool integrate into an existing configuration management or runbook automation system to exploit the existing automations?

As a proof-of-concept Orb Data designed and built an automation system using the standard core Netcool components (Netcool/OMNIbus and Netcool/Impact) and the open-source version of Ansible, not Ansible Tower, as used by RBA. The solution extended a basic enrichment solution to use the event details to identify an Ansible playbook written to resolve the issue. Playbook execution could be automated or semi-automated, i.e. requiring submission by an operator, dependent on the enrichment data. Automation execution requests are sent to a queue on a dedicated Ansible control node. The basic design is demonstrated in the figure below.

Proof of Concept Architecture

In this example, a security event has been generated to indicate the Linux service “nfs-server” is active on a DMZ server. The Event Viewer screenshot shows that the enrichment process has identified a related Ansible playbook for this event. At this point the playbook has not been executed, indicated by the AutomationState value of “Not requested”.

The Orb Data Operator Assistance is an Netcool/Impact Operator View designed to collate data useful to the operators. The operator view provides context to the event, details about the node, application and support model are available from the various tabs. The Impact server builds the HTML page from data retrieved from the CMDB. A Ticket tab will be dynamically added to the Operator View if a ticket is associated with the event. AJAX is used to periodically refresh the displayed data. In this example, the Automation tab is generated and details the Ansible Playbook information for the event. The operator can request the playbook is executed directly from the the operator assistant using the “Submit Automation” button.

The automation uses a very simple playbook using supplied variables to identify the target host and the service (alert_key):

Ansible Playbook

Once the Ansible playbook has been successfully executed, the operator view automatically refreshes to display the playbook output, enabling the operator to confirm the success or otherwise of the automated fix.

In this example, the monitoring tool generates a resolution event that automatically clears the original problem event, the issue having been swiftly resolved by the operations team. Specific automation may be configured to execute automatically on receipt of an event, eliminating the need for operator intervention.

The design of this solution could equally be applied to other configuration management or automation technologies, for example Puppet or Rundeck. It demonstrates the flexibility of Netcool. The POC solution was designed using the open-source version of Ansible, a powerful open source provisioning, configuration management and application deployment tool. Ansible is a CLI tool, and the POC used shared SSH keys to secure connections from the controller node to the managed nodes, with the managed nodes defined in a locally maintained inventory file. For a production environment those files would likely be managed using source code management, for example git.

RBA integrates to Ansible Tower, a Red Hat marketed product that can be licenced as part of “CloudPak for Multicloud Management”. Ansible Tower wraps the power and simplicity of Ansible within an enterprise solution. It builds on the power of the open-source solution, adding a UI and RESTful API that includes credential management, inventory management, job scheduling, workflow and role based access control features within a highly available, scalable clustered solution. A powerful tool, but one that requires a licence, and hence is not available to all Netcool users. The POC proved the simplicity of integrating Netcool into other automation tools.

If you have any question on how you could develop your Netcool solution, please contact Orb data at info@orb-data.com.

Hits: 473