Visualising Netcool/OMNIbus Data with Grafana

IBM’s Dashboard Application Services Hub (DASH) gives us the ability to view data stored in the Netcool OMNIbus ObjectServer through customisable views and filters of varying complexity.  The ability to precisely design how alerts are presented to an operator is invaluable for an efficient workflow, but sometimes all you need is a high-level overview. When it comes to a clean ‘single pane of glass’ view of data, alternative options can be considered. Grafana can be changed quickly and easily and includes a library of visualisations out-of-the-box.

The latest Cloud Netcool Operations Insight (NOI) release integrates with Grafana with minimal setup.  A set of Python scripts paired with the ObjectServer’s REST API allows Grafana to display alerts through Grafana in near real-time. The direct method of communication keeps latency low but requires careful consideration what data to parse as each modification requires changes to the Python script.  An alternative approach, available to on-prem solutions, is to leverage the Reporter database to act as a gateway between the ObjectServer and Grafana.

Many of the NOI installations our customers use today run alongside a Reporter database to satisfy the requirements needed to store long-term historical data.  Our own monitoring platform is no different. An ObjectServer on our monitoring server processes and stores alerts from customer environments, while a Reporter database keeps a record of all past alerts for analysis at a future point. Our monitoring server is integrated with JIRA for ticketing, OpsGenie for call-routing, and Slack for Instant Messaging (IM) notifications.  Together they ensure that alerts are assigned to the appropriate engineer the moment an incident is raised.  But there’s also value in a big picture view, one that creates distance from the isolated alert to reveal a perspective of the wider environment.  For that we have a Grafana dashboard presenting the at-a-glance key metrics to the Managed Services team.

Grafana Setup

MySQL is the only database supported by both the JDBC gateway and Grafana, so the Reporter database must be MySQL for this integration to work.  To start, I installed the MySQL packages on a monitoring server and created a database with a schema that reflected the ObjectServer fields I wanted to capture.

Next, I added the JDBC gateway component to OMNIbus to feed information from ObjectServer into the MySQL database. Configuration of the JDBC gateway involves modifying the mapping file which allows you to select which fields in the ObjectServer should be passed through to the reporter database. The server I used for this concept had limited storage space, so I was careful to only map the fields I needed for my Grafana visualisations.

After verifying that data was making it through to the Reporter database, I added the MySQL database to Grafana as a data source and ran a couple of test queries in Grafana to check data was flowing from the ObjectServer, through the JDBC gateway to the Reporter database, and into Grafana.

Once I had enough queries setup, I selected visualisations for them and before long the dashboard was showing counts of open alerts for each customer, a graph of historical alerts, and a table for open events not too dissimilar to that of the event viewer in DASH. The table can even be configured to include links to allow an operator to drill down into events.

In addition to the data provided by the reporter database, I added InfluxDB as another data source in Grafana.

I had Telegraf installed on the monitoring server to keep track of system resource utilisation and failed SSH login attempts captured by Fail2Ban. This data is stored in InfluxDB, which when added as another data source in Grafana, can be visualised alongside data ingested from the ObjectServer. I only needed to monitor one server, so Telegraf provided a lightweight monitoring solution. For larger scale monitoring of multiple servers, IBM Application Performance Management (APM) would be my preferred choice as its agents provide much more detailed data and are easier to manage than Telegraf’s monolithic configuration file. IBM APM also has a supported Grafana data source plugin, so the setup would be similar to that of Telegraf.

The Network Operations Center (NOC)-style wallboard is permanently displayed on a large monitor, showing server information and high-level customer alert data served by a monitoring platform and refreshed every minute.

Views: 1238

Comments are closed.