Signal vs. Noise

We saw this image pop up on Meme Generator last week. While we were honored to be included in our first meme, the message of the meme was a little disheartening. Clearly, one of our customers needs help sorting through the noise and getting to what’s important. The proliferation of great monitoring tools in the past few years has made it almost trivially easy to put instrumentation on every moving part in your platform or application. VictorOps is partnering with new monitoring tools every day,…
Read More

A Mobile Alert System for How We Live

One of the questions we hear at trade shows and events is: “Our team isn’t huge…do we really need a mobile alert system?” Our answer starts by asking: “Who actually does business just from 9am to 5pm, and what DevOps or IT team works from the same location?” Development and IT teams don’t work like that anymore and with mobile devices powerful enough to support almost any business function, many job descriptions have turned toward a 24/7 support model. Especially if your company has a…
Read More

Fun with Event Sourcing

At VictorOps we’re always striving to hone our infrastructure so we can handle increased customer load on less machines. Shared storage, such as MySQL, typically creates a bottleneck for high volume transactions, so trying to avoid queries to MySQL is one way to improve performance. We use Scala along with Akka, an Actor framework for the JVM, for our entire backend. One new feature in the Akka 2.3.x release is the Persistence module, which provides the ability to implement Event Sourcing. To quote Martin Fowler……
Read More

Get the On-call Firefight Survival Guide

Did you know that being on-call can cause something called Phantom Vibration Syndrome? PVS anyone? Or that there is scientific evidence that when you’re sleeping while on-call, you have less REM sleep, less “slow wave sleep” and a higher heart rate? But all of the so-called research doesn’t mean anything when you’re in the middle of a firefight. You’ve just gotta roll with it – sleep disturbed, tired, cranky – and get all hands on deck until the crisis is over. Being on-call sucks, but armed…
Read More

Most Companies Not Satisfied with their Alerting Tools

An interesting survey published by StackDriver and PagerDuty, sourced from data collected at the AWS:reinvent conference, suggests that almost 70% of people that were polled are dissatisfied with their alerting tools. This report resonated with us for obvious reasons. We have been talking with Alpha and Beta customers for months about their dissatisfaction with present offerings. What we hear is that current solutions don’t really help to solve the problem but rather just tell folks they have a problem. In a previous post, I talked about the…
Read More

The Distinction Between Alerting and Collaboration

Continuing on from the first part of my post, I wanted to dig in a bit and talk about the differences between alerting and collaboration. The VictorOps platform is built to provide functionality in all phases of the Incident Lifecycle. Alerting and Acknowledgment: As stated, about 10% of incident resolution is tied back to incident identification and routing, enacting company and individual escalation policies and finally contacting the correct individual via their desired contact method. Easy to use and integrated mobile apps help reduce the effort…
Read More

The Myth of Alerting Services in IT

When VictorOps is compared to PagerDuty, that comparison is understandable on the surface. Both services allow on-call team scheduling and rotation, both services will notify you of an incident in your IT infrastructure, both allow for escalation. In short, both systems will “tell you that you have a problem”. The difference however, is that VO is a Collaborative platform, not just an Alerting platform. Our vision is to be “in the fight” and actually help teams resolve problems faster. Simply put, our mission will be…
Read More

IT On-Call: Five Best Practices for Making the Pain Less Painful

Time on-call is a fact of life working in a DevOps or TechOps environment, but for a lot of us it’s the worst part of the job. Working with a 24/7 platform, on-call means getting up in the middle of the night, interrupting weekend time, and putting personal life on hold. And it’s stressful! It’s easy to feel alone during a crisis, not wanting to bother coworkers but needing help, advice, or just another set of eyes. Here’s a few easy things you can do…
Read More

Monitoring: Trending and Visualization is Not an Optional Extra

Everyone understands the importance of having system and application monitoring in place right away. In the SaaS world, every minute of downtime means lost revenue and angry users. Trending and visualization, on the other hand, can sometimes be seen as an optional extra. Statistics collection and graphs showing things like CPU load or web hits per second displayed over time are frequently regarded as “nice to have”, not “need to have”. In the rush to get a new platform out the door, many teams decide that trending and…
Read More