Runbook Automation – Automated Issue Resolution

In this blog I want to demonstrate how easy it is to configure Runbook Automation to automatically resolve an issue. Runbook Automation (RBA) is a component of Cloud Pak for Watson AIOps Event Manager, the concepts of which were explained in my last blog Runbook Automation – The Basics.

So, how do you set-up an automated Runbook? The following sections will describe the following steps to create a Runbook to restart a Linux Service via SSH. The sections describe how to:

  1. Define an automation connection for SSH so RBA can connect to the target servers
  2. Create and test an automation for the service restart, a template script
  3. Include the automation in a Runbook and test the Runbook
  4. Add an RBA trigger to associate the Runbook with the relevant events
  5. Enable automatic execution of the runbook on event receipt

SSH Connection

The RBA script automation type uses an SSH connection to execute a script on a target server. Authentication is managed by SSH keys. The RBA SSH key is generated from the UI option Administration > Integration with other systems > Automation type.

RBA Automation Type Configuration

Options are available for Group public keys and to establish the SSH connection via a jump server. The below example uses just a standard public key. That public key must be added to the file “~/.ssh/authorized_keys” for the appropriate user on the target servers.

RBA SSH Connection Set-up

Create and Test the Automation

Automations are added via the menu Automations > Runbooks > Automations. The automation created will later be added as a step in a Runbook.

NOI Automations menu

The definition for a script automation includes the script and any parameters that need to be passed to that script. By default, a “target” and “user” parameter are defined to identify the hostname of the target server and the user for the server connection and script execution. The value that is assigned to the user must match the user to which the RBA public SSH key was added in the previous section.

Custom parameters make the automation more flexible, reducing the number of automations that need to be defined. For example, in the screenshot parameters are defined to identify both the Linux service name and the “systemctl” command. The service name parameter means a single Runbook can be used to manage any service on a target server. The command parameter means that the same automation can be used for multiple service management tasks, either within a Runbook or across different Runbooks, for example, querying the status of a service, restarting a service or stopping a service.

RBA Script automation set-up

In this instance the script is very basic, running “systemctl” using the specified command and service and then querying the active state of the service. The script can be as complex as necessary, for example you may wish to restrict execution to specific values of the “$command” parameter.

#!/bin/sh
LOGFILE=/tmp/rba.linuxservicecontrol.log
echo "$(date +'%Y-%m-%dT%H:%M:%S')::$$::$(whoami)::Execute::${HOSTNAME}::systemctl ${command} ${servicename}" 2>&1 | tee -a ${LOGFILE}
systemctl ${command} ${servicename} 2>&1 | tee -a ${LOGFILE}
echo "$(date +'%Y-%m-%dT%H:%M:%S')::$$::$(whoami)::Execute::${HOSTNAME}::systemctl is-active ${servicename}" 2>&1 | tee -a ${LOGFILE}
systemctl is-active ${servicename} 2>&1 | tee -a ${LOGFILE}

Once saved, the automation can be verified using the “Test” option from the menu in the Automations table, click the ellipse at the end of the row.

RBA automation test

In the example below, the four parameters are manually entered on the right-hand side to execute “systemctl status chronyd” on the target server as the “netcool” user. The script exit code, “stdout” and “stderr”are returned to RBA.

RBA Automation test

Runbook

An Operator or SRE will step through a Runbook in an effort to resolve an issue. A Runbook is defined from the menu Automations > Runbooks > Library.

RBA Library

A Runbook can contain many steps, a combination of manual steps and automations. For this example, a single step will be included to restart a Linux service.

RBA add an automation to a Runbook

The “LinuxServiceControl” automation instance previously defined requires 4 parameters. The “user” and “$command” parameters are assigned static values, “netcool” and “restart”, respectively. The mapping for “target” and “$servicename” is “Runbook parameter”, this means the values must be assigned when the Runbook is executed. We will shortly assign values to those parameters from an associated event.

RBA Runbook parameters

Once saved, the Runbook can be manually tested from the “Run” link or by selecting the ellipse “Preview” option.

RBA Runbook Preview

 

For the Runbook test, it is only necessary to set the parameters for “targethostname” and “linuxservicename”. The parameters “user” and “$command” have fixed values. Details on the script output, run time and exit status are returned to the UI. If all tested successfully, the Runbook can be associated with events using a trigger.

Runbook Test output

RBA Trigger

An RBA Trigger, not to be confused with an ObjectServer trigger, will associate a Runbook with live events. This enables the Runbook to be launched directly from the event or for the Runbook to be executed automatically. To be a candidate for automatic execution the Runbook cannot contain manual steps.

The test environment is monitoring Linux services and generating events with “AlertGroup=LinuxServiceStatus”. The “AlertKey” is set to the name of the service.

Linux Service Status event

An RBA trigger is created from the menu Automations > Runbooks > Triggers. The trigger definition has multiple sections.

Trigger details identify a unique name and a description for the trigger.

RBA Trigger Details

The Trigger conditions is effectively an SQL “where” clause to identify the events to be associated with a Runbook. The administrator can execute the trigger condition from the editor to confirm the correct events are matched.

RBA Trigger Condition

 

The Assign a Runbook  section identifies the Runbook to be associated with a matched event and can assign values to the Runbook parameters from the event. In the example, values are assigned to the parameters from event data.

RBA Assign a Runbook details

 

And finally, Trigger priority is used in the scenario where multiple triggers match the same event. The Runbook from the trigger with the highest priority (lowest number) will be associated to the event.

RBA Trigger Priority

 

Once the trigger has been saved and enabled, the Runbook reference will be associated to new events matching the trigger condition. This is immediately visible to operators from the “Runbook” column in the alerts list and the Runbook details can be viewed and executed from the selected alert pop-up on the right-hand side.

Executing a Runbook from an alert

 

Clicking “Execute” opens the “Run Runbook” dialog with the “targethostname” and “linuxservicename” parameters populated from the event. The operator needs to click “Start runbook” and the “Run” button for the  step.

Runbook execution feedback

 

Clicking “Complete” for the step will enable the operator to feedback on the success or otherwise of the Runbook, data that will assist with identifying Runbooks for automatic execution.

Runbook Feedback

 

Assuming a sampled monitoring tool, the event should automatically close.

Alert closure

Automatically executing a Runbook

At what point does the solution mature to the point where Runbooks are executed automatically? The Runbook Library (Automations > Runbook > Library) includes summary statistics from the Runbook executions, based on the operator feedback.

Runbook Library

The Execution tab gives more details about each execution. Each column header is also a filter.

Runbook Execution details

 

The success rates of the Runbook “LinuxServiceRestart” suggest it is a candidate for automatic execution, which is enabled directly from the trigger editor Assign Runbook section.

Trigger configuration to automatically execute a Runbook

 

The next time a matching event is generated, the Runbook is automatically executed. This is captured in the event timeline or can be viewed from the page Automations > Runbook > Execution where the script output can be viewed.

Alert timeline including the automated Runbook execution information

Summary

Automated issue resolution is the utopia, but there are a few steps required to reach that stage. To help start the journey RBA includes an HTTP REST Interface to assist with the import of existing instructions to the tool. The tribal knowledge of Operators and SREs will likely identify the first set of Runbooks suitable for semi-automation. As the semi-automated Runbooks mature the statistics will help identify suitable candidates for full automation.

For further information on Runbook Automation or any of the Cloud Pak for Watson AIOps feature, please contact info@orb-data.com.

 

Visits: 293