Mike Meredith - March 01, 2017
Recent events on the Internet have produced a lot of headlines, and if you’re an Ops Manager, a lot of headaches. Yesterday’s AWS outage caused widespread issues across several industries, and many affected organizations are waking up today realizing they didn’t have a good way to respond, other than waiting for Amazon to identify and correct the issue. Outages happen to everyone; the key is knowing how to respond, and indeed knowing whether you can respond at all.
At VictorOps we’ve avoided direct impact from these events, but it got me thinking about what makes a “core competency” in a modern business context. As more and more businesses adopt cloud-first or cloud-only strategies, there’s a parallel rush to outsource what used to be key components of an organization’s online presence: DNS, email, website hosting, billing, and on and on.
This makes sense for a lot of companies. Frankly, many these technologies can be arcane and difficult to set up and manage. Outsourced providers can achieve scale and redundancy that might not be available for a young startup or even an established small or mid-sized business. And using services like this can greatly reduce the time it takes to bring a new product to market.
But what happens when one of those services has a failure? What happens when the company providing the service gets acquired and availability goes downhill? Or what if they go out of business? Or what if they’re subjected to a DDoS attack? Is your business resilient enough to weather these kinds of problems? Perhaps most importantly, will your customers understand and agree that it’s “not your fault”? If not, you need to be ready to respond.
Outsourcing a service isn’t an excuse for allowing a gap in core competency or key infrastructure. At VictorOps, for example, we use third-parties for services like DNS hosting and email delivery. But, we have self-managed versions of these services ready to go if we need them.
If our DNS or email partner has a widespread problem, we know that we can pick up the ball and run with it ourselves until they get their issue resolved. This is a big advantage for us, apart from our ability to manage our own destiny if a third-party service has a problem. Since we maintain those services in-house as well, we have deep, up-to-date knowledge about how they work.
We have the troubleshooting skills to be able to know for ourselves what’s going on when there’s an issue, and to know whether we need to take mitigating steps of our own. And if we ever get to a point where we decide one of our partners just isn’t working for us, we know that we can cut the cord without impacting our customers for the worse. We know, because we’ve done it.
So what can your company do to protect itself, while still taking advantage of the speed and agility of outsourced services? There’s a few things to keep in mind when you’re architecting your product:
Avoid silver bullets If your service is 100% dependent on a specific feature of a specific third party service, you have a problem. You’ve guaranteed that you’ll only ever be as reliable as that third-party. As major cloud infrastructure providers roll out more and more attractive specialty services tied to their platforms, this becomes more and more important, and at the same time, an easier trap to fall into.
If your product is only capable of running on a single cloud provider’s platform because of a proprietary database or storage service they offer, then what happens when the cloud provider decides that they don’t want to provide that database or storage service? What happens if the cloud provider decides they want to directly compete with you? What if their service becomes unacceptably slow? Can you do anything about it? Remember, just because you’ve outsourced a key part of your solution, that doesn’t mean you’re not responsible for how well it works in the eyes of your customers.
Always have a Plan B So many companies fall into the trap of “I’m using the best-of-breed solution for x, I don’t need a backup option.” As an industry leader in alert and incident management, we’re here to tell you: everyone has outages, and some of the biggest services can have some of the biggest outages. You need to have a contingency plan for what to do when your “best-of-breed” gets rabies.
Don’t be driven by trends “We do it this way because Unicorn X does it this way” is great, if you’re running the same kind of business for the same kind of customers. But before you adopt an all-AWS strategy because that’s how Netflix does it, you need to ask yourself why Netflix does it that way, and whether those factors really apply to what you’re trying to do. They may, but then again they may not. Different priorities and challenges require different solutions.
At VictorOps, we’re proud to be a progressive organization, and a leader in the DevOps movement. We’ve always been aggressive about using new technologies and new methods to find ways of improving our service. But underneath it all, we know that maintaining our core competencies is the critical foundation that allows us to move fast and try new things. We push the envelope, but we don’t ever forget the basics. We can’t afford to. Can you?