Capturing Raw Requests in Play

Tara Calihman - October 22, 2014

Recently I had a need to capture raw request bodies in a Play Framework 2.3.x application. Initially I thought, “Easy peasy, there’s probably a field for that on the request…”. Well, since I’m writing this blog post, it goes without saying my initial assumption was wrong.

Request Bodies and Actions

Apparently in versions of Play prior to 2.2.1 there was indeed a way to access the raw body of a request as indicated by this post on the Play Google Group. In Play, a controller implements a series of “actions”, which take a Request[A] and return a Result.

Note:_ For all the gory details see the official [_documentation_ for Actions._]

Below is a simple controller which takes a request of type JsValue and just returns an Ok result with a message indicating the age posted in the JSON.

Notice here that by the time the action is invoked, the request body is already a JsValue (line 11). This is the general pattern most controllers follow, but the idea is you declare an action to accept a given type and a BodyParser class which will convert the raw array of bytes to the type specified (line 10). This is a very powerful pattern and leverages the Scala type system so your action code can be concise and not have to handle doing parsing in-line. At this point, we need to take a brief diversion and discuss Iteratees, which are the underpinning of BodyParser classes

Iteratees and BodyParsers

A BodyParser _in Play is actually a function which takes a _RequestHeader[A] and returns an Iteratee from Array[Byte] to A, where A is a parametric type parameter. The type parameter A typically refers to one of the common types seen in controllers such as: JsValue, String and so on, but in practice could be any type, but you’ll need to write an Iteratee for it. Fully discussing and explaining Iteratees is beyond the scope of this post, but it can be summarized with the following quote from the Play Iteratee documentation:

“An Iteratee is a consumer - it describes the way input will be consumed to produce some value. An Iteratee is a consumer that returns a value it computes after being fed enough input.”

I also really like James Roper’s explanation in his post where he builds from the idea of an Iterator (known to most imperative programmer) and leads the reader to the features of an Iteratee.

Hoarding the Bytes

So, there’s a challenge, if the goal is to have access to the raw request bytes from a request, since inherently by the time an Action _has access to a request the raw bytes are abstracted behind the content type of the _Action itself. After various false starts, I ended up writing a BodyParser that essentially wraps the original BodyParser, consumes the raw bytes, then runs the original_ BodyParser_ on the raw bytes emitting the a wrapped version of final result type  specified. Below is the code for the wrapping BodyParser and the accompanying Action used to proxy the request to the original _Action _block:

Once we have this wrappedBodyParser and extractRaw methods defined they can be used as shown in the rawJson controller method:

The Action now has a request with a raw field which is of type Array[Byte]. Note the use of the RawRequest[A] which extends WrappedRequest[A]. This is a technique documented in the Play Composable Action documentation. It facilitates creating various extended request type to simplify Actions.

There are actually two layers of wrapping: one in the wrappedBodyParser in order to extract and pass a field for the raw bytes between the BodyParser and the Action, via the WrappedPayload[A] class. Then we create a RawRequest[A] using the WrappedPayload[A] and call the original block for the action.

Take Aways

Creating a BodyParser _and a _WrappedRequest, we can extract the raw bytes from the request and then pass in a RawRequest[A]. The existing Action block is then called with the RawRequest[A].

One of the main drawbacks to this approach is that we incur the memory overhead of the raw bytes reading them into memory and storing them between the body parse phase and action invocation. If one were willing to archive all requests, the overhead could be mitigated and archiving/logging could take place from the BodyParser itself while the bytes pass through the iteratee.

I’m also unsure if there’s a better way to run the original Iteratee without pre-buffering the raw bytes array. Seems like it’s doable, but I just haven’t had the time to figure it out. This is still a work in progress and I’m curious for feedback or better approaches to this problem.