This is a guest post from Jeff Simpson, a Senior Software Engineer and resident biking badass here at VictorOps.
Who is Jeff?
Over the course of his 20-year career, Jeff has showed aptitude for Unix sysadmin and the packaging of software. Because of this, he’s had no choice but to fall into a DevOps methodology. It’s always been ad-hoc and never formal but he seems to be the one to get the deployment going on the dev end.
He was using Chef at a previous job but is pretty excited as the VictorOps team has recently migrated to Puppet from CFEngine. Using Puppet allows for more flexibility since it was created in Ruby. Additionally, Puppet has a repository of policies, an engaged community that loves to contribute and good documentation.
The main challenge for the VictorOps stack is the bridge between Puppet-managed configuration for the VO app itself. This is where most of the errors occur and the most critical part relative to deployment. Everything else is testable using VAGRANT but Jeff is hoping to move more towards using Puppet itself to orchestrate releases.
While Jeff has never been officially on-call at other jobs before, his career history is comprised of working at start-ups, which means that everyone is always unofficially on-call. If there’s a large enough team, it’s not really a big deal.
The only real problem that Jeff sees about being on-call is that occasionally he forgets that he’s actually on-call. He recalls a time when he was up in the mountains, riding bikes at night, at least an hour from home, when he remembered that he was on-call. Fortunately, there weren’t any big firefights that night but Jeff would love to see an out-of-range notification in VictorOps that alerts you when you’ve traveled somewhere that limits connectivity or might cause you to miss an incident escalation.
When I asked Jeff about devs who don’t want to carry the pager, his response…”Are there any who really want to?”
How does it work here?
At VictorOps (as with DevOps itself), there is a team approach to everything that goes on. Mike, our Senior Director of IT, is steering the direction and facilitating the system itself. He is in charge of configuration management, has built out the generic Puppet plumbing with base policies, takes care of Ubuntu installation and all other issues that arise when setting up additional data centers. But it’s the developers that fill out the last details: knowing what versions & tools the platform server needs, tying in some of the plumbing with the app, database configuration and helping with replication.
To add value to our monitoring set-up, we’ve also started using Rearview. Jeff wrote a Scala version which feeds off graphite data and sends an alert when a certain number of events pass a predetermined threshold. It’s a matter of passive monitoring versus Nagios, which is an example of active monitoring. Rearview isn’t directly reaching into an app to get data (hence, the passive monitoring) but with scripts wired into Nagios, the program can make sure clusters are healthy, check specific orgs and provide a continuous feed of alerts that we’re monitoring. Basically, Rearview insures that we have good end-to-end alert processing. This is key when doing a deployment because if you break alert processing, that’s not a good thing.
“At VictorOps, I think we’re doing a good job at DevOps. I just hope we don’t need to use our product much.”