Some of the most visible artifacts of organizational silos in engineering are tools. Visibility into dashboards, workflows, or documentation is cordoned off in separate and often redundant systems. Conway’s Law manifests in tool choices and integrations as much as in application development. As a team matures an Incident Management or DevOps practice, breaking these tool walls is necessary.
In this post I’ll explore a low effort, high value way that you can extend integration between VictorOps and JIRA to break down those silos, and empower your on-call team with direct access to information across systems.
Atlassian JIRA and VO
The JIRA/VO integration enables bidirectional communication between both systems. New issues, updates, or other workflow triggers in JIRA can flow to the VictorOps timeline as incidents, or informational items in the timeline. Similarly, changes to those incidents in VictorOps can flow back to JIRA as updates to the originating issue.
When you Read the Fine Manual, you’ll get a sense of the basic setup. Make sure you configure both parts of the integration – inbound webhooks from JIRA, and outbound webhooks, to complete the full workflow cycle.
Really unlocking this integration is dependent on some understanding of the JIRA Query Language (JQL). JQL will be immediately familiar to anyone who has previously worked with SQL, operators, or really any advanced search interfaces.
Without limiting the data sent between the two systems, you’ll quickly overwhelm your timeline with every update you make in JIRA. Your on-call team will be grabbing pitchforks and heating tar thanks to the flood of updates to the timeline you caused. The first hour after I set it up, I was inundated with 20+ *incidents*. Hooray developer velocity, boo flooding the on-call team with noise.
Before you crush your timeline with noise, take a moment and consider which JIRA projects you want talking to VO, and under which conditions. I’d suggest starting with something low volume, or a test project to keep it manageable as you dial-in your JQL.
First approximations in JQL are pretty straightforward. The basic search allows you to build your conditions, then click over to “Advanced” to see the generated JQL. This is easy in a static sense and will help get your feet under you, but gets a bit trickier as you’re thinking about changes to existing incidents as triggers for VO integration. To stop the noise in my timeline, I settled on a simple filter:
project = VI1 AND issuetype = Bug AND status in (Open, Resolved) AND priority = "Sev 1" AND assignee in (mboeckman)
Enter the Transmogrifier
Filtering JIRA output is a good start, but the real power of this integration comes to light once we dig into the Incident Automation Engine. It’s all well and good to send a subset of JIRA to VO, but how about we route that specifically to the teams, individuals, or policies for whom they are most relevant?
During your development work on this, I’d strongly recommend setting anything from JIRA to a Warning* status – this ensures that no matter how bad you are at JQL, or Transmogrifier, you don’t accidentally send incidents to your on-call team. Here I’ve used a blunt technique to send anything associated with the api_key for JIRA in my instance to WARNING.
* – In your VO instance, Settings->Alert Behavior->Configure Incidents allows you to determine whether Warning or Critical states actually create Incidents.
Let’s say I want anything with Sev 1 in JIRA (matching my JQL filter above), to be routed specifically to my devTeam in VictorOps, and handled as an incident? The Transmog allows you to do this with an additional rule.
From here you have a sense of how you can vary both output from JIRA, and processing within VictorOps to enable highly focused routing of incidents from JIRA to your on-call teams. Of course, you can also annotate incidents, pulling other information from the JIRA integration and sending it along with the incident to your team.
Some additional time should be spent investigating and exploring the meaningful fields from both systems. Once you have the basic filtering in place, take a look at one of your JIRA incidents in the timeline.
Clicking the “More Info” button on an incident will expand all fields being shipped from JIRA. You may regret clicking this button as your screen real estate will be consumed by several dozen lines of jira.issue.fields.customfield_12107!
A full rundown on JIRA API fields can be found here. Let’s look at a few generic, and likely relevant fields:
jira.issue.fields.creator.displayName, jira.issue.fields.priority.id, jira.issue.fields.priority.name, jira.issue.fields.status.name, jira.fields.status.id, jira.issue.id, jira.issue.key
Similarly, the rundown on all VictorOps Incident fields can be found here. Some of the more interesting fields for this integration are:
routing_key, api_key, message_type, entity_display_name
It’s hard for me to overstate the value of focused experiments here – start simple, trigger an alert by creating a JIRA issue, then dive into fields you see on the Incident in VO. That “More Info” button will be data rich, and hopefully trigger ideas for your integration.
A tale of two alerts
Sweet! Now we can send information back and forth that’s relevant for both Ops and Dev teams. Let’s take it a step beyond basic “An issue was created, go beep someone”. I’ve previously discussed the value of teams sharing plans with each other in on-call handoffs.
Broadcasting planned software releases, system maintenance, or other changes that may be triggers for incidents is important. Certainly, a team benefits from the insight into these changes in a tactical sense. Perhaps as important is the cultural benefit of teams talking about what they’re working on with each other. Sharing plans is important, but is only part of the picture. Let’s see what happens when that sharing happens in real time.
Consider an incident, arriving at your on-call teams phone late in the evening. Something is down! Somebody, do something!
Whoever responds to this is starting from scratch. A webhost is not responding. Ok. The host itself seems up (PING OK), but that’s all we have to go on. What’s wrong? What changed?
Once more, this time with context
In this example, I’ve built a workflow to drop an INFO notice in the timeline any time a JIRA issue moves to “DONE”. Specifically, through JQL and Transmogrifier, issues associated with JIRA Projects that result in code deploys.
We have the same alert condition, but in this example we’ve provided the on-call team with some meaningful, close at hand context: A frontend push just happened. Perhaps the two are related, perhaps not. The first responder here has a headstart on our poor friend in the first scenario – this time we know a change was introduced, and the responder can focus initial efforts on that push. It’s only a breadcrumb, but when you’re focused on MTTR every second counts.
The above INFO level notice in the timeline was generated with a slightly advanced Transmogrifier rule below, pulling the release branch (jira.changelog.items.1.toString) out of a custom field in JIRA:
Now our on-call team has direct line of sight to any changes made to our production environment. Similar integrations could be built from other job tracking systems like ServiceNow or Salesforce. You could also push notifs directly from CI/CD systems using the VictorOps API.
Context and iterations
Your first efforts here will be imperfect, however you implement the integration. Stay focused on what kind of information you’d like to share between the two systems, which workflows, fields, or changes are relevant. Add two minutes to your postmortem, or handoff meetings to discuss the integration – did we empower a team to better respond? Or blind them with a storm of JIRA trivia? Break down silos wherever you find them, in tools or teams… and keep iterating!