At VictorOps we’re always striving to hone our infrastructure so we can handle increased customer load on less machines. Shared storage, such as MySQL, typically creates a bottleneck for high volume transactions, so trying to avoid queries to MySQL is one way to improve performance.

We use Scala along with Akka, an Actor framework for the JVM, for our entire backend. One new feature in the Akka 2.3.x release is the Persistence module, which provides the ability to implement Event Sourcing.

To quote Martin Fowler…

Event Sourcing ensures that all changes to application state are stored as a sequence of events. Not just can we query these events, we can also use the event log to reconstruct past states, and as a foundation to automatically adjust the state to cope with retroactive changes.”

This post will describe how to use Akka Persistence with a simple example motivated by our real-world use cases.

Why Event Source?

Part of what we do at VictorOps is to ingest alerts and maintain the notion of an incident and it’s associated state so we can notify our customers when that state changes in particular ways. We need to persist that state so upon restart of our services we can reconstruct the latest state.

One way to handle tracking the state would be to use a relational database or document store to represent the incident and it’s associated state. Each alert which comes in would require a read to the database and a subsequent write to update the state and possibly some form of conflict resolution. The database quickly becomes a source of contention with a high volumes of alerts.

Event Sourcing is another way to tackle the problem. With Event Sourcing, each incoming alert can be appended to a very fast log (e.g. Cassandra) as the event is processed. During restarts the log is replayed from the last optional snapshot point. This allows the log processor to rebuild it’s state from scratch or the persisted snapshot. By keeping track of the the state transitions in memory, we get drastic speed ups.

Akka Persistence

We’ll quickly go over Akka Persistence. I’m going to assume you know a little Scala, but if not, you’ll hopefully be able to follow along anyway.

An Akka Actor is a class which has some special properties:

  • Conceptually it’s a lightweight thread which is mapped to a set of real threads by the framework.
  • Each message is handled in a logical single threaded context. This means the code does not have to worry about locks and so on.
  • Actors can be supervised and restarted upon crash.
  • Actors can have other Actors as children which are supervised.

Here we have a class called MyActor which is a subclass of Actor. All actors must implement the receive method which processes all messages. The MyActor class handles one message, “test” (line 8), and all other messages (handled by the case _ expression on line 9) are logged as unknown.

Within the receive method, we are guaranteed that while we’re processing a message no other messages will be processed. That’s all we’re going to say for now about Actors.

The next section will introduce a very simple example using  an Event Sourced Actor.

Counting Stuff

The following example will keep track of the number of hits, conversions and the current conversion rate for some simple website serving ads.

Event Sourced actors extend a trait (a trait is basically an interface which can have some methods defined), EventsourcedProcessor. The EventsourcedProcessor trait has two abstract methods: receiveRecover and receiveCommand.

You can think of receiveCommand as the “hot” path for newly arriving messages and the receiveRecover method the recovery handler upon restart which is handed previously persistent state. In event sourcing we control which and when events are persisted by calling the persist method in the receiveCommand method.

The following event sourced Actor defines a little site tracker which keeps track of the total hits, conversions and a running conversion rate. We log each message to get a view into the processing flow.

We’ve defined a class, SiteStatistics, which carries the running state for the actor. The class tracks three fields: hits, conversions and conversionRate.

Note the two aforementioned methods: receiveRecover and receiveCommand. Again, the receiveRecover method is called by the persistence module during a recovery (by default the preStart method of the actor starts recovery).

The receiveCommand method is for incoming, “hot”, messages that get persisted after we calculate the total hits and conversions and then calculate the conversion rate. There are two message types supported: Hit and  Conversion. Each handler updated the respective field in SiteStatistics, calculates the conversion rate and then persists the calculated result.

In the receiveRecover method we handle messages of type SiteStatistics and simply update the state of the actor each time. Once recovery has completed the actor state will be in the most recent state persisted from the receiveCommand method.

One can imagine if this was an expensive calculation or accessing a remote service having that persisted could be a big savings when reconstructing the actor state.

We have included a test harness using the Specs library for Scala. The test asserts the calculations are performed correctly and that restoring the actor state resumes where the last calculation left off.

Pros and Cons

Pros:

  • Avoids lots of CRUD operations on a database.
  • Possibly huge gains in performance.
  • Events can be replayed if there is a need to re-process old data.

Cons:

  • Depending on the problem, the memory requirements could be large if you have lots of persistent objects.
  • No database to query and get a state overview. This means you’ll need some diagnostics you may not normally  need in the “traditional” approach.
  • The coding style can feel “awkward” due to the internal state management normally pushed off into the database realm.

Conclusion

While Event Sourcing is not a perfect fit for every problem, it can be a very powerful approach to creating a scalable and fast infrastructure. Additionally, the ability to replay historical events to re-process can save the day if a coding error is discovered and the data need to be updated with the new logic.

If you’re using the Akka framework the Persistence module is available by adding an additional dependency (com.typesafe.akka:akka-persistence-experimental:2.3.3) and you’re up and running.

Additional Resources