Integrate Solarwinds events with IBM Tivoli Netcool OMNIbus

Over the last few years I have come across more and more network management teams using the Solarwinds suite to manage their estate. The Solarwinds products are competitively priced and easy to install and it is this ease of install that negates the need for a company to pay for a consultant to install and build the Solarwinds architecture. The Solarwinds brand has obviously caught on as I am bumping into more and more companies which have selected Solarwinds as the tool of choice to manage the network.

System management teams are still using Entity Management Systems (EMS) such as IBM Tivoli Monitoring (ITM) and Microsoft System Control Operations Manager (SCOM) to monitor applications and the underlying applications that reside on a server.  So that there is still a definite requirement for  the data from these various sources Solarwinds, SCOM and ITM to be presented in one consolidated data store to the operators. Therefore the “manager of managers” IBM Tivoli Netcool OMNIbus is an excellent choice to integrate these disparate EMS into one consolidated data store This blog will provide a brief overview of integrating Solarwinds with IBM Tivoli Netcool OMNIbus.

Please note that there are licensing requirements around the use of  IBM Tivoli Netcool OMNIbus that must be adhered to when performing this suggested integration.

Solution Architecture

Solarwinds Architecture3
The method I used to integrate Solarwinds with OMNIbus was to utilise SNMP as the delivery method into OMNIbus. In simple terms (for the MttrapD newbie) when a node or interface goes down in Solarwinds, Solarwinds raises a trap and sends the trap to the IP address and port of the MttrapD probe. The MttrapD probe then interprets each trap based on a set of rules and breaks the trap (event data) down into tokens and parses them into elements. The elements are then used to assign values to ObjectServer fields.

Creating the Rules

The Solarwinds administrator had also configured Solarwinds to send  SNMP traps to the MttrapD probe. In addition to this the Solarwinds administrator had also forwarded ,the Solarwinds MIB (management Information Base) to me. With this MIB I generated the event formatting and processing logic required to integrate Solarwinds with OMNIbus by using the IBM Tivoli Netcool MIB Manager tool.
MibManager
The following files were output from the Netcool MIB Manager tool:
  • SolarWinds.m2r.master.include.lookup
  • SolarWinds.m2r.master.include.rules
  • SolarWinds-preclass.include.snmptrap.rules
  • SolarWinds-preclass.snmptrap.lookup
  • SolarWinds-SOLARWINDS-TRAPS.adv.include.snmptrap.rules
  • SolarWinds-SOLARWINDS-TRAPS.include.snmptrap.rules
  • SolarWinds-SOLARWINDS-TRAPS.sev.snmptrap.lookup
  • SolarWinds-SOLARWINDS-TRAPS.user.include.snmptrap.rules
After adjusting the running MttrapD rules files to include the newly generated files (both rules and lookups) and then performing the much loved “kill -HUP <PID of nco_p_mttrapd>” command  to force the probe to re-read the rules a problem was uncovered. The MIB provided created a rule-set that was far too generic to be off any practical use. Here is a snippet of the code from the the file SolarWinds-SOLARWINDS-TRAPS.include.snmptrap.rules

Not enough logic?

The Solarwinds MIB had  created a rule-set with processing logic for three types of traps, the first of which was a generic alert case “1”, the second a detailed alert case “2”  and finally case “10” which was a device down trap. All the traps that were being displayed in the ObjectServer Active Event List (AEL) were all of specific trap 10 (device down).  However it was clear from looking at a selection of events from the alert.details tab that there were “device up” events being received using specific trap 10. As the rules generated assumed that any specific trap with a value of 10 was only a device down event. The knock on consequence of this is that there is no event resolution being performed using the rules generated by the MIB Manager as all traps were interpreted as the same type (1-Problem). This is best illustrated in a diagram.
DeviceDownv2
The above diagram provides an outline of what happens when Solarwinds sends a “Device Down” trap, the trap is received by the MttrapD probe and the data is parsed into fields which can be inserted into the ObjectServer. The problem with the rules generated by MIB Manager is that it uses a case statement to build the rules processing logic, this would be fine if each specific trap had a its own unique use case (a one to one relationship). However Solarwinds sends a specific trap 10 for both device up and device down events.
DeviceUp
As there is no further conditional statement within the case 10 statement the device up alert is created using the same logic as a device down alert. Therefore the alert identifier (used by the ObjectServer de-duplication trigger) is identical for both a device up and a device down alert. So for example network device A goes down an alert would be sent, this would create an alert in alerts.status with a count of 1. Secondly, a second alert is then sent from network device A stating that network device A is back up, this alert would be de-duplicated (by the de-duplication trigger) as the identifier is the same and the count would be increased to 2.
It is clear as day that a second conditional statement has to be inserted to create the logic to differentiate between a “device down” and a “device up” alert. In this instance an if statement was created to test that the $AlertMessage field contained either ‘Up’ or ‘Down’ text. This would cause the rules to branch and create the correct alert. So for a device down alert the severity would be set to “Minor” and Type to “Problem” while a device down alert would be set to “Indeterminate” and Type to “Resolution”.  The new logic for a device down alert is displayed in the following diagram.
DeviceCorrect
An if statement within the rules tests for a regular expression match that $AlertMessage contains the following pattern “^.* is Down\.$”. This creates event details exclusive to a device down trap. The new logic for the device up alert is displayed in the following diagram.
DeviceUpCorrect
With the rules correctly interpreting device down and device up alerts, the ObjectServer is now able to perform event resolution on Solarwinds alerts. Of course these rules can be modified further to provide more granularity as Solarwinds differentiates between Node (which is a network device) and Interface (yes you guessed it an interface on a network device), so you could add more logic to the rules to verify whether the alert is about a node or an interface.
This is the end of this blog that outlines one possible method of integrating Solarwinds with IBM Tivoli Netcool OMNIbus.  I sincerely hope  that this blog will prove helpful and if you would like to discuss anything in this blog then give me call on Orb Data on +44 (0) 1628 550450 or email me at neil.richards@orb-data.com .

Views: 402