Integrating IBM Process Mining with Jira

I spent a week or so recently studying for (and passing) an IBM certification for Cloud Pak for Business Automation. All the components of this product have business value however the product that interested me the most was IBM Process Mining. This product was recently acquired from an Italian company called myInvenio to enable IBM to help its customers make faster, more informed decisions for process improvement using their own data.

A few weeks later I was at a London IBM event and had a discussion with the IBM Process Mining lead and he suggested that I look at integrating IBM Process Mining with Jira. The Atlassian Jira software suite is used by Orb Data for our Managed Service Desk and so this seemed like a good project and one that I could easily use real-world data. This blog gives an overview of the process I used and a few of my findings.

To analyse a process, IBM Process Mining requires you to upload a log file (CSV or XES) into the Data source therefore the starting point was to look at Jira and first decide how I could get a CSV file containing the data I needed and decide what fields were needed to feed into IBM Process Mining. For any data source IBM process Mining requires 3 mandatory fields to automatically derive a process model.

Process ID, a distinctive identifier of every new instance(For Jira this would be the ticket ID).
A list of activities that are related to the process IDs. (This is the essentially the flow of the Jira ticket and all changes in its status).
Time information for each activity (for example the date/time when each activity is executed).

Exporting status changes in Jira

This was the most difficult part of the process for me. The standard reports that we provide for our customers did not have the required data and using a command line process we use to populate alerts in Netcool only returned a single ticket as one large row of data whereas I needed to track status changes as individual rows of data i.e. each ticket has multiple rows – one for each status change. In my investigation I read this Atlassian article which gave 3 methods and the one that proved easiest for me was the last one which used a 3rd party app called Issue History. This app allowed me to examine the data as separate records for each status change as required and then export it to CSV or Excel format. To do this you need to create a filter that shows the ticket list that you want to work on. In my case, I created 2 – one to show all tickets and one to show all resolved tickets and then you can simply select Issue History from the app menu and then choose a date range. I chose a year’s worth of data.

By default, the app displays a lot more columns than are needed including all the comments etc and so I choose this sub-set of active fields: Assignee, Issue Type, Priority, Status and the app automatically chooses Date of Change, Issue Key and Updated By. The field Priority is not needed however the app seems to want or need this and so I was unable to remove it. It doesn’t really matter though as this can be removed later anyway.

I then exported the data to a CSV file by pressing the Export button however as you can see from the sample exported data below the columns that are exported use different names to those displayed in the dashboard. We have Issue URL (not needed), Status [new], Assignee [new] and Priority [new] and Priority columns which we can forget about. The status columns contain the data we want for the purposes of Process Mining and therefore we need to capture the Status value and the time it changed. For example, in Incident 32140 we need to know that it arrived on 15^th June at 7:48 PM, was assigned to John Smith at 8:00:54 PM (WAITING FOR SUPPORT) and resolved (RESOLVED) at roughly the same time. To capture this, we need to choose either the Status or Status [new] columns depending on the latest value and similarly the Assignee and Assignee [new] based on the change that has been made.

The column Date of Change is also an issue. As you can see this column always contains the string “at” and despite Process Mining allowing for custom dates, it was too much effort to get this working. For both reasons I realised that to get the exact data I needed I would need to first parse the exported file.

Date of change	Updated by	Issue Key	Issue URL	Issue Type	Status	Status [new]	Assignee	Assignee [new]
Jun/16/23 at 4:32:53 PM	John Smith	ORBMS-32141	https://orb-data.atlassian.net/browse/ORBMS-32141	Service Request	WAITING FOR SUPPORT	WAITING FOR CUSTOMER	Anne Stephens
Jun/16/23 at 3:16:23 PM	John Smith	ORBMS-32141	https://orb-data.atlassian.net/browse/ORBMS-32141	Service Request	WAITING FOR SUPPORT		Unassigned	Anne Stephens
Jun/16/23 at 12:39:36 AM	Customer1	ORBMS-32141	https://orb-data.atlassian.net/browse/ORBMS-32141	Service Request	WAITING FOR SUPPORT		Unassigned
Jun/15/23 at 8:00:54 PM	Managed ServicesImpactBot	ORBMS-32140	https://orb-data.atlassian.net/browse/ORBMS-32140	Incident	WAITING FOR SUPPORT	RESOLVED	John Smith
Jun/15/23 at 7:48:34 PM	John Smith	ORBMS-32140	https://orb-data.atlassian.net/browse/ORBMS-32140	Incident	WAITING FOR SUPPORT		Unassigned	John Smith
Jun/15/23 at 7:47:4 PM	Managed ServicesImpactBot	ORBMS-32140	https://orb-data.atlassian.net/browse/ORBMS-32140	Incident	WAITING FOR SUPPORT		Unassigned
Jun/16/23 at 4:05:7 PM	Anne Stephens	ORBMS-32139	https://orb-data.atlassian.net/browse/ORBMS-32139	Service Request	WAITING FOR CUSTOMER	RESOLVED	Anne Stephens
Jun/15/23 at 8:08:7 PM	Anne Stephens	ORBMS-32139	https://orb-data.atlassian.net/browse/ORBMS-32139	Service Request	WAITING FOR SUPPORT	WAITING FOR CUSTOMER	Anne Stephens
Jun/15/23 at 8:07:22 PM	Customer2	ORBMS-32139	https://orb-data.atlassian.net/browse/ORBMS-32139	Service Request	WAITING FOR CUSTOMER	WAITING FOR SUPPORT	Anne Stephens
Jun/15/23 at 8:04:9 PM	Anne Stephens	ORBMS-32139	https://orb-data.atlassian.net/browse/ORBMS-32139	Service Request	WAITING FOR SUPPORT	WAITING FOR CUSTOMER	Anne Stephens
Jun/15/23 at 7:57:21 PM	Customer2	ORBMS-32139	https://orb-data.atlassian.net/browse/ORBMS-32139	Service Request	WAITING FOR CUSTOMER	WAITING FOR SUPPORT	Anne Stephens
Jun/15/23 at 7:52:56 PM	Anne Stephens	ORBMS-32139	https://orb-data.atlassian.net/browse/ORBMS-32139	Service Request	WAITING FOR SUPPORT	WAITING FOR CUSTOMER	Anne Stephens
Jun/15/23 at 7:48:21 PM	Customer2	ORBMS-32139	https://orb-data.atlassian.net/browse/ORBMS-32139	Service Request	WAITING FOR CUSTOMER	WAITING FOR SUPPORT	Anne Stephens
Jun/15/23 at 7:37:53 PM	Anne Stephens	ORBMS-32139	https://orb-data.atlassian.net/browse/ORBMS-32139	Service Request	WAITING FOR SUPPORT	WAITING FOR CUSTOMER	Anne Stephens
Jun/15/23 at 7:37:23 PM	Anne Stephens	ORBMS-32139	https://orb-data.atlassian.net/browse/ORBMS-32139	Service Request	WAITING FOR SUPPORT		Unassigned	Anne Stephens
Jun/15/23 at 7:33:20 PM	Customer2	ORBMS-32139	https://orb-data.atlassian.net/browse/ORBMS-32139	Service Request	WAITING FOR SUPPORT		Unassigned

Parsing the Data

The simplest way to do this is to use a custom script that will perform the following actions:

Format the date.
Create individual files for each issue type e.g., Service Request, Change etc.
Remove unused columns.
Use the Assignee [New] column where the assignee has changed.
Use the Status [new] column where the Status has changed.
Discover the column order so that output columns can be in any order without breaking the script. The only exception to this is the comments column which needs to be deselected in the Issue History app.

The execution takes the raw Issue History data and creates formatted CSV files as shown below. In Jira all these files will represent a different process and so it makes sense to format them in this way.

% perl jira2PM.pl downloads/Resolved.csv
Creating file Incident.csv
Creating file Problem.csv
Creating file Service_Request.csv
Creating file Change.csv

Importing the Data File

If the data is in the correct format importing the data into IBM Process Mining is relatively simple. Once you have logged in click on Custom Process and drag the CSV data file you want to create the process from onto the Upload Data Source box and press Next.

When the file has uploaded you will see the first few rows of the import. For each row choose the column name that matches the data enclosed ensuring that you have a selection for Start time, Process ID and Activity. Once this is done press Next.

The time format configuration should be detected as the custom script has created a date/time format in the required format. You can choose other formats and a custom format. Press Create Process to create the process.

The process will automatically be displayed.

The Model automatically displays the frequency analysis. The dark blue colour highlights the most frequent activities, while the bold arrows highlight the most frequent transitions. In this way, the most frequent paths between activities of the process can be identified.

The numbers next to the lines show how many times that specific process flow was followed.
The numbers within the activity boxes show the number of times that the activity is carried out.
The description in the activity boxes indicates the name of the activity and the roles by which the activity is carried out. They might be more than one (multiple dots are displayed).
The percentage indicates how many possible relationships you are currently visualizing.

Process Conformance

As an experiment, I took the derived model as an exported BPMN file from IBM Process Mining and imported it to IBM Blueworks Live which is cloud-based software that provides a collaborative environment to build and improve business processes through process mapping. In Blueworks Live I then added a new activity called Document Solution.

I then exported the new BPMN file and reimported it into IBM Process Mining.

Once you have a model imported you can see the conformance. With the Model Conformance feature, it is possible to do a visual conformance check between the Data-derived model and the Reference model. You can also click Compare to have a look at the similarities and differences between the two of them.

You can see in the example below that Document Solution is shown in orange to indicate that the activity or transition is only present in the Reference Model.
There is also a Red box on the “To Do” activity to show that that activity is not present in the Derived model and was not imported into Blueworks Live.

KPIs

I’m not going to go through all the settings and analytics available in IBM Process Mining in this blog, but I will touch on KPIs. By defining the KPI settings, you can visualise which activities are risky or critical for the process KPIs by setting KPIs for the performance analysis such as acceptable time limits that can elapse between the start and the end of the process or acceptable time limits that can elapse when it passes through an activity (amongst other things). As you would expect you can then visualise the KPIs in the model, selecting a time-based performance view (average, maximum, minimum, or median) and turning the KPI palette on. For instance, I created an artificial KPI breach on FOLLOW UP and in the view below that the time taken from a service request being in FOLLOW UP to moving back to WAITING FOR SUPPORT has breached the 8-day threshold.

And Lastly…

There are too many features to detail in this blog but I might get to those in future blogs. One particularly interesting feature is simulation which allows you to make future predictions by simulating the Return on Investment (ROI) before you implement any process improvement initiative, such as increasing staffing levels, changing working time or using Robotic Process Automation (RPA). This feature allows you to simulate using robotic agents to take over some of the process.

There are also detailed analytics for each IBM Process Mining process that allows you to analyse specific instances of the process from the data imported. For each process, it is possible to create one or more Analytics dashboards to visualize process data. These can be customized but by default can contain the following widgets:

Process cases
Case details
Case variants
Process model
Average lead time
Lead time influencers’ information
KPI summary

If you would like to know more about any of these features in IBM Process Mining or managing Jira data then don’t hesitate to get in touch with me at simon.barnes@orb-data.com or leave a comment below.