Dan Hopkins - June 24, 2014
Q: My company currently has 18 employees and our “Devops” team is made up of 3 of those 18 employees. I like your product but it seems like more than I need for our small team. Why would a startup use VictorOps?
A: I’ll answer by explaining how we use VictorOps at VictorOps. We are most definitely also a startup and we currently have 20 employees. Our primary on-call rotation consists of 6 devops guys (we share that responsibility), three customer support guys, and our three founders.
Two major teams that receive alerts are the Support team (3 guys getting Salesforce alerts - page everybody, no rotations) and our Devops team (which consists of 6 people - primary rotation, escalation to director of IT, escalation to CTO, hand-off meetings).
We are starting to feel out the efficacy of routing alerts to different teams: platform team, infrastructure team (which consists of Mike, our Director of IT), and eventually, a frontend team. The idea being that, as a startup, we want to have quality of life but we don’t have a big team. So there is a point person for the DevOps team who either fixes the issue or routes it to the appropriate person.
As a startup, you’re expected to know everything about everything anyway so @ mentioning someone is a great choice for lightweight escalations where you want to get help if someone is generally not busy. However, when the problem is such that intervention by other team members is required, the built-in paging features of VictorOps alert the right person automatically until they acknowledge the request.
Here are some of the other reasons VictorOps makes an effective tool for startups:
increases the visibility across everyone in the company - the more eyes you can get on the problem, the better your chances of solving the problem faster. Because everyone does everything, a quick resolution can often touch several people (regardless of team).
easy to hop into the timeline and get context - see exactly what is happening, who is working on the problem and what steps they’ve taken to attempt to resolve the problem. With small teams, you want to get the right people on the same page as fast as possible. Sometimes that means that a team member can “fail fast”, meaning they know they can’t really help. This speeds escalations to the next step to solve the problem faster.
post-mortem reports close the loop on incidents - they allow you to tell the execs exactly what happened, recommend fixes to bugs and begin to build a knowledge base that will help solve future problems by showing what was done in the past to solve similar issues. Post-mortems also build a statistical history of what things fail, wasting precious team resources. Sometimes the best answer in the small company is to outsource more things and these statistics can help support those suspicions.
on-call management can be tough with just a few people - with VictorOps you can configure for small teams in terms of setting up rotations, taking on-call from someone and scheduling overrides in the on-call calendar. Our main team of 6 has a rotation but our smaller subteams just have notify users and it was super easy to allow small teams and big teams to have a name (functional areas).
with customizable escalation policies (one of the coolest features, imo) you can add an email address into the Ops escalation policy. This may not seem like much but it’s nice to get an email and know what’s going on without getting a push notification, which only serves to disrupt life.
Putting these systems into place before you get big means that you’ll have plenty of experience (and lessons) under your belt for when you do begin to grow your infrastructure and your team. It’s always harder to scale a homemade cobbled together tool. If you’re a startup and you’d like to give us a try, shoot us an email. We’re always happy to help out!