There are many ways that communication can break down. Here at VictorOps & StatusPage.io, we have some experience in dealing with firefights and together, we came up with a list of the eight most common ways we’ve seen companies fail at communicating.
1. Assuming your customers know where to look for updates. Just because you have a Statuspage and/or Twitter support handle doesn’t mean anything unless your customers know about those channels also. It’s easy to point to these places from your website, knowledge base or supporting marketing materials. Better yet, proactively send messages during downtime to users who have opted in to receive them.
2. Overpromising solutions. When there’s an outage, it’s tempting to say anything to keep your customers happy. But this strategy is one that can backfire. Your customers want an honest answer and your best bet is to stick to the truth.
3. Not having a process in place. No one likes to think about major outages but the reality these days is that it’s not if, but when. If you think that you’ll know what to do during a firefight without ever actually practicing what you’re going to do during a firefight, you won’t have the muscle memory in place when the real thing happens. The very least you can do is create some parameters around your process and decide who is going to be involved in creating customer communication.
4. Leave your internal team in the dark. The way you communicate with your customers should probably be a bit different than the way you talk to your internal teams. Messages to your internal teams can be more specific & technical.
5. Throw upstream providers under the bus. Sometimes the reason you’re “down” is an upstream provider that you use to deliver your own service is down. Your first instinct will probably be to deflect blame and throw them under the bus. After all, it’s their fault that you’re down….right? Wrong. At the end of the day, you made the decision to build your product in such a way that a single point of failure could bring you to your knees. You need to own it.
6. Not posting updates often enough. When you have an extended outage, you need to consistently push out updates as the situation evolves. It’s not enough to just get something up on your status page and then leave interested parties in the dark for the next few hours. Your customers want to know that you’re continuing to try to fix the problem and the way to do that is by continually reassuring them.
7. Waiting too long to acknowledge the issue. Quickly acknowledging that a problem exists is the single most important factor that goes into how the public will grade your outage response efforts. Imagine the following scenario: your website is having some serious issue…let’s say a large percentage of requests are taking a really long time or completely timing out. One of your customers goes to your status page to see what is up, but your status page says that everything is operating normally and that there aren’t any issues. That’s a really frustrating experience for that user. They’ll feel like they’re being lied to…like you’re trying to cover something up.
8. Forgetting to apologize. Some outages are so severe that they warrant your writing a post-mortem about the issue. A common mistake I see with post-mortems is that they forget to lead with an apology. You may think that it’s silly to apologize in a technical post-mortem but it’s not. At the end of the day, your users are human — and humans are wired to be emotional. Your outage probably threw a serious wrench in their day and they’re (rightfully) upset about it. Apologize and mean it — it will make it much easier to start clawing back some of the trust you lost with your customers because of the outage.
The nifty infographic above can act as your cheat sheet – feel free to use it as a reminder of what you should or shouldn’t be doing. If you feel there are any big ones we’ve left off the list, please share other communication fails in the comments below. Here’s to a less-stressful on-call process!