Each incident has a life of its own. Here’s how VictorOps works through each phase of the incident lifecycle to help you solve the problem faster.

Alerting: Tame the Beast

Ensure the right people are alerted for issues within their problem domain through advanced routing

  • Programmatically alert on-call team members to issues that need attention – when VictorOps consumes an alert that you’ve defined as critical, it begins the paging process as you’ve defined.
  • User-specific Notification Policies – users may choose to be alerted via a combination of Push, SMS, email and phone – at specified intervals.
  • Flexible Escalation Policies ensure issues are addressed within your organization’s required time frames

Triage: Alright, you’ve been paged. Now what?

Forget logging into your disparate monitoring systems and dashboards to get up to speed!

The VictorOps Timeline

  • Whether you’re in front of your machine, or on your mobile device you can quickly get up to speed with your infrastructure and applications.
  • Rather than only surfacing limited information about the critical alerts that have fired, you can leverage the timeline to gain situational awareness of other alarms from your systems that may be contributing to the outage.
  • The timeline is the single source of truth in the victorops system – showing all alerts coming off of your system, who’s being paged, and conversation occurring pertaining to problem identification and resolution.

The VictorOps Incident Pane

  • Users get a distilled view of the critical alarms in their system, and have the ability to ack or re-route the issue to one or more teams – quickly pulling in team members to the fire-fight.
  • Get right to what you’re looking for with the ability to filter the Incident Pane by items that are paging you, paging team’s you’re on or all paging events.

Investigation: You got paged, but now you need help

  • Twitter conventions like @messaging to help pull other team members into the fire fight.
  • Because VictorOps was built mobile first, and believes in information symmetry, – your colleagues can quickly get up to speed from anywhere and offer suggestions for resolution.
  • Team connectivity via several mechanisms.
    • *Chat into the timeline to keep team members up to speed on steps being taken toward resolution.
    • Chat into specific incidents to make your notes part of the log of incident resolution. View the people pane to see who’s online and available to assist. Quickly reference contact information for all team members.
  • *Also supports bidirectional integrations with horizontal chat platforms like HipChat.
  • Easily assemble team members for a Control Call (voice conference) if more synchronous communication becomes necessary.

Identification: Now source the problem.

  • Save valuable time with the VictorOps Transmogrifier. Users can now annotate alerts with Runbooks, Triage Documents, Important Assets (ex. Graphite Graphs) and notes on problem resolution.
  • When using the Transmogrifier feature, real-time remediation data along with suggested solutions come attached to an initial alert. Therefore, placing relevant facts in the hands of developers and engineers when it counts.
  • Facilitate documentation clean-up by making notes about accuracy/helpfulness of annotated assets while in the moment.

Resolution & Documentation: The battle is over, but your work isn’t done.

  • Because VictorOps customers firefight in the application, everything from the raw alerts, paging events, activities associated with a Control Call and chats about problem resolution is stored in the timeline. Using the ‘Post-Mortem’ tool, users can pull a section of the timeline for use in retrospectives and reporting on SLAs for internal and external constituents.
  • VictorOps supports ‘continuous documentation’ via Incident Frequency and Post-Mortem reports which facilitate discussion around whether all alerts are actionable and if so, whether the Runbooks and Triage documentation were up to date.

Getting Started | All you need to get the party started.

5 Steps to Kickstart your Trial

Stay mobile.

With a swipe of our native Android and iPhone mobile apps, you’re up to speed instantly with what your infrastructure and systems are doing. And when your infrastructure stops doing what it’s supposed to do, you can see exactly when and in what context the whole scenario went down.

Get it on Google Play or download from the iTunes app store.