Here at VictorOps, we rely heavily on Akka and during my time working with the environment/tool/language, I started seeing similarities between microservices and actors. Actors allow you to take pieces of your app, put them on their own servers and then enable them to communicate via HTTP. You can scale the app in places where there are bottlenecks and apply resources without being wasteful.
Microservices are untyped (they communicate via JSON) however it is often the approach to write a client library on top of a JSON API, taking a weakly-typed interface and turning it into a more strongly-typed interface. Expanding on this idea, it is pretty easy to see how I made the jump of not thinking of actors as units of code but more like microservices.
Don’t believe me? A few other similarities…
— Both can be located anywhere in your infrastructure and communicate over TCP.
— Actors can happen to be running on the same boxes, or different boxes, but you know when you make a call, you expect a certain response.
— You can scale actors like you would scale microservices. Run actors over here on these computers and add or subtract boxes as you see fit.
— Actors, like microservices, are also weakly-typed. You can write client libraries that make it easier to consume API.
— Microservices and actors are subject to the same pitfalls. What happens if my service can’t be reached? What happens when things time out?
It’s like I’m Using a Microservice…
We already think of actors as services. Our team adopted a new paradigm: create an actor, write a library that allows for communication with that actor and then allow other people to use that library. By doing that, we have a more holistic view of how the actors should work, especially around the aspects of handling failure and ensuring you have a well-typed interface for communication.
Building Actor Interfaces
Below is an example of how we might write a typed interface for an actor. By interfacing with an actor through in a prescribed fashion we take the work of typing the responses off of the user, minimizing the possibility for error and reducing the time it takes for somebody to get up and running with the actor.
Akka has rich semantics for specifying message delivery semantics. Just like in the world of networked services, actor’s systems and clients can be built with at-most-once or at-least-once message delivery. In the example above, all messages are sent with at-most-once delivery. To guarantee at-least-once delivery of messages to actors, things get a little bit more complicated. Mixing in the AtLeastOnceDelivery into a PersistentActor gives the implementer the necessary tools to build in such messaging guarantees.
One challenge that we encountered trying to build in such guarantees was that we have built our own Zookeeper-based Akka cluster. As such, when a node goes down, there is no guarantee that its corresponding ActorPath will come back. This is a problem as the deliver() method in the AtLeastOnceDelivery method saves the corresponding ActorPath of the message when it is persisted.
We ended up using some sleight-of-hand to get around this requirement, which we believe to be an elegant way to allow maximum uptime of the delivering actor as well as retain its at-least-once semantics. What we did was to leverage the ability of actors to “become” a new state on message receive as well as the Akka resolveOne() method to asynchronously ensure that we have a valid ActorRef to send to. Once we know we have a valid ref, we can resend all un-confirmed messages to the new ref’s path; since the old path may never work again.
Another (naive) strategy to accomplish the same goal would be to block the actor in the preStart() method until the actor could resolve the path. This is not a great strategy however, because we can be accepting and persisting messages for sending later when the ref finally does resolve, as well as needlessly blocks a thread in the process. Below is an example of an actor implementation that follows this pattern:
Akka, and actors in general, are a fantastic way to build a scalable, fault-tolerant architecture because they put failure (in all its forms) at the forefront of the design decision. Thinking of these units of code as their own services takes this metaphor to the next level and pushes implementors to think more holistically, not only on the “happy path” of execution, but also how they scale and fail under less than ideal conditions.