VictorOps is now Splunk On-Call! Learn More.

Simplicity is dead – accept it. I know it feels wrong, but the only area you can simplify is the customer’s experience when they interact...

Read More »

In software development and IT, a mindset of continuous iteration and improvement leads to faster delivery of reliable applications and infrastructure. But, part of continuous...

Read More »

A couple of years ago, when Amazon announced its famous Prime Day sale, the website traffic suddenly increased by 28%, amounting to 73.8 million visitors...

Read More »

Effective site reliability engineering (SRE) relies on a deep understanding of a service’s underlying infrastructure and architecture. Improving the visibility into application and infrastructure health...

Read More »

Preparation is the key to effective on-call management and faster incident remediation. From our State of On-Call Report, we found that incident response, on average,...

Read More »

Common Gaps in SRE At its core, SRE is an engineer’s approach to improving operational system reliability via a path that includes, unsurprisingly, even more...

Read More »

Creating a DevOps environment of collaboration, code ownership, and accountability inherently helps teams build on SRE efforts. We spoke with Mike, an SRE Manager at...

Read More »

I’m not completely sure everyone knows the real costs of downtime—and it’s a helluva number… In fact, DevOps.com conducted a study showing that Fortune 1000...

Read More »

We decided to embark on a journey to make our systems more reliable by creating intentional chaos. Our team developed the SRE Council, made up...

Read More »

VictorOps, like many startups, has gone through major growth in the last couple years. New teammates, new customers, and a maturing organization have all demanded...

Read More »

Let us help you make on-call suck less.

Get Started Now