Todd Vernon - June 19, 2014
Last February, Joe wrote a great article about our use of WebSockets in the VictorOps Android App. WebSockets are a really cool way to keep an active connection to the VictorOps backend services while our app is the foreground. This gives the app real-time performance without the polling that would be required if implemented as a REST interface. WebSockets also extend battery life by reducing the amount of information that needs to be transmitted.
We get so many hits on our blog from developers interested in WebSockets, I thought I would talk a little about some discoveries we made on the iOS side of client development.
In our iOS application we use the SocketRocket library written by Mike Lewis and from the beginning, we have had great luck with this library. It seems to handle anything we throw at it without any strange behavior or memory resource issues. It also seems to be very stable across OS releases.
As Joe points out, we selected WebSocket protocol for its real-time performance. As with many network protocols, connect time is often then the most time-consuming part of the transaction. This is due to the TCP three way handshake. While this performance overhead is often not noticed in high speed WiFi or LTE networks, in slower networks like EDGE, this can be a serious hindrance in highly interactive applications. Many developers don’t really think about it, but performance is not a binary behavior but rather affected by distance from the cell tower, obstructions, etc.
Consider a typical take-on-call transaction in our mobile application. Using a poor network connection with a weak signal, the initial WebSocket connection is complete in 13 seconds. From that point, until the application enters the background, that connection time can be amortized over all future transactions. In our take-on-call example, the message that the client sends to the backend to “take-on-call” only requires 4 seconds. If we had to create a new socket connection in this environment, 60% of the timing would be simply the TCP connection overhead of creating the message. Some would point out that HTTP pipelining has the potential to receive similar gains (still requiring request/response however), but it’s important to remember that intervening network gear can override your request shutting down connection.
EDGE and 3G network improve the connection times to 10 seconds and 6 seconds respectively. WiFi connection time is around 500ms and a subsequent take-on-call message on that connected socket is about 27ms.
Early in development, we had switches in the applications to support encrypted or unencrypted communications with the backend. Interestingly, we found during testing that the AT&T LTE network would only support the encrypted mode. Our suspicion is that AT&T has some kind of full packet inspection edge network gear that didn’t like (or recognize) the WebSocket protocol. Making the connection encrypted solved this problem, likely because the suspected network gear could no longer inspect the packets. AT&T was the only network that we saw this behavior but it outlines the necessity to test in every network configuration you need to support.
As great as WebSockets are and as necessary as they are in interactive application environments, they are not free. A considerable effort went into the JSON-based protocol that sits on top of WebSockets in our implementation. There are situations where WebSockets are simply too much. We are adding some new background processing components to VictorOps mobile applications in the near future and for some of these kinds of “non-human in the loop”-type situations, REST will be a great alternative.
In many ways, VictorOps is a mobile-first company. We elected from the onset to have different code bases (and developers) for Android and iOS. We want the engineers working on the respective platforms to love the platform they work on and therefore, make the apps insanely great! We also believe that performance is everything. You should be able to pull your phone out of your pocket and see that status of your infrastructure in 4 seconds. Statistically, we are about there right now. To make mobile great, you have to embrace the mobile use case - in, out & back to your life.