VictorOps is now Splunk On-Call! Learn More.

Crisis Communication Breakdown: How to Improve Communication During an Outage

Marlo Vernon June 14, 2018

DevOps On-Call Collaboration
Crisis Communication Breakdown: How to Improve Communication During an Outage Banner

Crisis communication is an important part of our DevOps culture, and it’s a critical part of taking ownership of operations. Crisis communication will be difficult, but your customers will ultimately respect honesty. Below, we’ve outlined everything you need to know about communicating during a crisis.

Crisis Communication Can Be Scary

Crisis communication is going to be difficult—it’s human instinct. When you’re having problems, the last thing you want is to draw attention to them. Here are some reasons you might hesitate before communicating a crisis to the public:

  • Talking publicly about a failure, maybe even before you know what caused it can feel like admitting incompetence
  • You could publish information that may later turn out to be inaccurate and end up looking foolish or uninformed
  • You could accidentally disclose information that an attacker could use to make the situation worse
  • Your competitors might use it against you
  • The press could report on your outage and damage your reputation in the marketplace
  • Maybe your customers didn’t notice the outage. You could be drawing attention to a problem your users weren’t aware of

Not Communicating Is Worse

Don’t kid yourself. Your customers will notice, and information has a way of becoming public quickly. Once you lose control of the message, it can be really hard to get it back. An old adage in politics and chess says, “If you’re reacting, you’re losing”. So, being proactive about your system’s reliability and communication with your customers will build trust. If you’re spending the day responding to other people’s Tweets or Facebook posts, the perception can become that you’re trying to hide something and information has to be extracted from you.

Honesty Pays Dividends

Your users will reward honesty, candor, and timeliness with loyalty. Every organization experiences failures at one time or another. Customers know that no one is perfect, and they will value a company admitting that. Interactions with customers should be a two-way dialogue where customers see their feedback listened to and acted upon.

Takes Buy-In at All Levels

Crisis communication takes buy-in from all parts of the business.

  • DevOps can’t go at it alone – The DevOps team can’t unilaterally decide to start posting to Statuspage or tweeting about production problems. Crisis communication has to happen in the context of protecting company IP assets and especially protecting the privacy and security of users.
  • Every business function has something to contribute – If the messaging is done without the involvement of management and business-focused teams, the message could be poorly crafted and poorly aimed.
  • Messaging needs to be accurate, but also accessible – On the other hand, if the technical team isn’t involved, then your messaging may be inaccurate and your reputation can suffer.

Agree on When and How Often

There are a few key things that everyone should agree on when it comes to crisis communication strategy.

  • Generally, communicate as soon as possible – When there’s something affecting your customers, it’s best to communicate as soon as possible.
  • Some issues need to be contained before they become public – Depending on the type of issue, going public might cause more damage in the short term. The key is to identify different sorts of issues before they happen and have a strategy for how to communicate each type.
  • Set a regular cadence for updates – Whenever you post that first notification, you should update your customers on a regular cadence. Stick to that cadence as much as possible, and see if there isn’t anything new to report. It shows that you’re actively on the case and won’t leave everyone wondering what’s happening.

Agree on What Gets Communicated

  • Help your customers make intelligent choices – Users need enough information to make their own intelligent, timely decisions. If their platform depends on yours, they need to know if it’s time to deploy backup or failover measures.
  • Provide context and realistic ETAs - It’s better to underpromise and overdeliver than it is to overpromise and underdeliver. Ensure that you’re providing accurate context and measuring timeframes realistically.
  • Use incident management templates to ease messaging concerns – There are numerous incidents that could be recurring. Statuspage offers an incident template feature where you can set up incidents ahead of time. That way, you don’t have to figure out how you’re going to communicate an incident on the fly.

Agree on Who Manages the Message

It is important to speak with a single, consistent voice. Be sure to make your message consistent on all platforms and have multiple confirmations that the information is accurate.

Agree on Where the Message Is Shared

There should be well known, publicized channels for your users to get information. Establish these channels and stick to them. ChatOps tools like Slack, Twitter, email, and the telephone are all important tools to use—especially when an issue is affecting one particular customer. But, Statuspage is a great centralized point of publication for real-time platform statuses.

Collaborate With ChatOps

Crisis Communication Training and Certification

Crisis communication teams attend training and get certification. They learn how to identify types of crises and handle messaging each time. They’re the people that can set up the templates and the team’s Statuspage so messaging is ready to go depending on the type of incident. They also learn how to plan for crisis communication in advance and identify the key stakeholders to communicate with. Here is a useful resource for crisis communication training and certification.

Non-Disruptive Communication

It’s important to manage crisis communication without disrupting critical work. The best way to do that is to have someone act as a liaison between first responders (Incident Commander?) and the crisis communication team. At VictorOps, we try to keep conversations about crisis communication separate from firefighting by using a different Slack channel. We also have plans and runbooks ready with details on how to get the message out.

Applied DevOps: Cross-Functional Post-Incident Reviews

It’s important to retrospect on the crisis communication process separately from post-incident reviews. Some good questions to ask are:

  • Was the communication accurate and timely?
  • Do we make bad situations better for users?
  • Do our efforts to communicate help or hinder the frontline processes?
  • Is there anything in the process that we can automate, better document, or eliminate?

It’s also useful for the crisis communication team to be part of post-incident review conversations in order to get a better understanding of the platform as a whole, and raise questions from an end user’s perspective.

Key Takeaways:

  • Your failures will become public
  • Who controls when and how the information gets out? You, or someone else?
  • Having a plan makes crisis communication less of a burden on the people fighting the fire. It also means that your message can get out more quickly and accurately
  • Doing it right will build trust and loyalty with your users

VictorOps helps you better coordinate incident response before, during, and after an outage. Sign up for a 14-day free trial to see how you can leverage VictorOps to improve crisis communication during an outage.

Let us help you make on-call suck less.

Get Started Now