Runbook Automation – The Basics

This blog started life as a demonstration of how easy it is to set up automatic issue resolution using Runbook Automation, a component of Cloud Pak for Watson AIOps Event Manager. But that blog assumed everyone understood the concepts of Runbook Automation, so first a bit of background.

Across the “too-many-to-mention” years I have been working with event management tools I have seen numerous runbook solutions, solutions to help the Operations Team resolve issues. The level of complexity of those solutions is broad, from the most basic documentation that requires manual searching, to enrichment-based solutions that automate event escalation and enable integrated tools to launch directly into the appropriate runbook. All of these solutions have had two things in common: first they have been custom developed solutions, and second they are instructions for the operators to manually execute steps, they do not facilitate automated or semi-automated resolution of issues.

Runbook Automation (RBA) is a feature of Event Manager, integrated into the event management engine based on Netcool Operations Insight. So no additional installation, no additional set-up, RBA is ready to go.

A Runbook details the resolution steps for a specific issue. RBA enables multi-step Runbooks to be defined. A Runbook step can be manual instructions or an automation to be executed against a remote system. An automation can:

  • Execute a script via SSH
  • Send an HTTP request, for example RESTful API
  • Execute a job or job workflow template via an Ansible Controller (formerly Ansible Tower)
  • Execute a BigFix fixlet or task

A given Runbook may include any combination of the above automations plus manual instructions for the operator. RBA can automatically link a Runbook to an event and, if appropriate, automatically execute the Runbook, i.e. execute the Runbook immediately on receipt of an event instance to resolve the issue without requiring operator intervention.

To reach the utopia of auto-issue resolution, the Site Reliability Engineers and Operations Team need confidence in the Runbook being executed. RBA tracks the success rate of all Runbooks and the underlying automations. These statistics can be used to identify Runbooks that are candidates for promotion for automatic execution.

Overall, the Runbook solution saves Operators/SREs time and reduces the mean-time-to-resolution for issues. Operators and SREs have a single UI for viewing events, viewing Runbooks, executing those Runbooks and reviewing the results of executed automations. Runbook statistics collected by the solution accelerate the adoption of fully automated issue resolution, further saving the SREs and Operators time.

Executing a Runbook from an alert

There are also significant benefits delivered to the solution administrators. No longer do they have to manage custom code and integrations as Runbook Automation is a standard feature of CP4WAIOps Event Manager. Neither do the Operators need to manage the deployment of automation scripts to the distributed systems for execution via SSH. The scripts and their deployment is managed by Runbook Automation, the most up-to-date script will always be executed.

Runbook execution feedback

So, that is the basics of Runbook Automation and the main benefits of the solution. My next blog will demonstrate how simple it is to set-up automated issue resolution.

Views: 215