{dogfooding: a slang term used to reference a scenario in which a company uses its own product to demonstrate the quality and capabilities of the product}

It’s really cool when the platform you built helps fix the platform you built. This weekend was a great example of how integrating alerting, timeline and collaboration together helps to solve problems faster.

Dan Jones, our CTO, was on-call for Operations this weekend and got a push notification that we were having problems sending SMS notifications out through Twilio. He knew it wasn’t actually Twilio however because we have specific checks for that and he was receiving SMSs himself.

todd_blog

After some debugging, it was determined that it was an iptables misconfiguration on one box in the cluster (making the problem happen pretty rarely).

Dan Hopkins noticed in Syslog messages that it was a connectivity problem between two boxes in our cluster.  Dan @mentioned Mike in the timeline causing him to get a push notification and he responded in the timeline a minute later.

Problem solved!