[This is a guest post from Ian Neveu, a Cloud Services Engineer at Entrust. We cannot thank him enough for sharing his knowledge and contributing to our blog!]
Everyone has experienced a period at their organization when all calamity breaks loose and the flood gates of Hades open up and you wonder, “How much burn will I have to feel this week?”. For me and the team, that question seemed to resonate consistently and kept us wondering “What else will break this week?”. I know that not every organization faces the same challenges when it comes to a technical or proprietary manner.
However, the basis of what kind of effort or the work involved when incidents arise are the same no matter what form of business you are employed in. If you take into consideration that any issue is a problem, you then can associate the problem management principal that your company employs. At my organization, we choose to use a variety of methods that until recently were isolated tools that did not have any integration or communication with each other. This in itself was a problem that spawned hundreds of additional issues, all related to the simple fact that our development team had a hands-off approach when it came to working with cloud services and operations.
A huge anchor securing our lack of innovative movement lay before us now; the obstacle seemed unsurmountable and impassible. With this challenge in hand, we started to work through our processes and saw that when the fires began we had a gap in trying to loop in research and development, as there was not a standard form for our two groups to easily communicate together. In reviewing our resources, we identified that some of the tools we use and thought were connected between teams, were in fact not talking to one another. After identifying the missing link, we worked together to identify what content could be shared across the teams, and what would benefit the collaboration most.
After boiling down the options we identified two major gaps: no shared space for ongoing/critical issues that both teams were investigating, and no effective cross-team communication tool had been identified. These two issues in tandem were costing our teams man hours to the extent that we not only were missing critical deployment timelines, but we were no longer able to mitigate the newly-arriving fires from day-to-day functions.
We quickly made the decision to share a central whiteboard across the teams, and stand up separate calls for discussions regarding RND and operations. In the course of this investigation, we also came across a nice handy little feature within the wonderful glorious VictorOps called sub-conferencing. This is new-to-us feature allows you to create secondary rooms based off of the initial control call created. These sub-conference rooms have allowed us to branch out our investigations during issues, dramatically covering scope that was not previously possible.
We have lovingly adopted the name Welching, which is the name of the engineer that found the sub-conferencing feature we now use consistently. Long story short is you always have to ensure you have the right communication happening and the right parties are involved in your discussions. If your collaborative efforts are not fruitful, take it from me — you may just have to trim a few leaves to find the golden apple!!