Showing Chaos Engineering Posts

Your Open Source Tool Belt: Chaos Testing Blog Banner Image

A couple of years ago, when Amazon announced its famous Prime Day sale, the website traffic suddenly increased by 28%, amounting to 73.8 million visitors...

Read More »
What to Make of SRE's Golden Signals Blog Banner

Effective site reliability engineering (SRE) relies on a deep understanding of a service’s underlying infrastructure and architecture. Improving the visibility into application and infrastructure health...

Read More »
September Roundup: A Game Day Recap Blog Banner

Preparation is the key to effective on-call management and faster incident remediation. From our State of On-Call Report, we found that incident response, on average,...

Read More »
Simulators and Validators for SRE and Chaos Engineering Blog Header

Common Gaps in SRE At its core, SRE is an engineer’s approach to improving operational system reliability via a path that includes, unsurprisingly, even more...

Read More »
How Workiva Built a Culture of Devops and SRE Banner

Creating a DevOps environment of collaboration, code ownership, and accountability inherently helps teams build on SRE efforts. We spoke with Mike, an SRE Manager at...

Read More »

I’m not completely sure everyone knows the real costs of downtime—and it’s a helluva number… In fact, conducted a study showing that Fortune 1000...

Read More »

We decided to embark on a journey to make our systems more reliable by creating intentional chaos. Our team developed the SRE Council, made up...

Read More »

VictorOps, like many startups, has gone through major growth in the last couple years. New teammates, new customers, and a maturing organization have all demanded...

Read More »

Ready to get started?

Let us help you make on-call suck less.