At Hootsuite, we analyze user data to help us answer questions about product usability and give us insight into trends. Recently, we designed and implemented a RESTful API that provides access to our aggregated data sets and shields our data consumers from changes in infrastructure that might be happening in the backend. We built our RESTful API using the Play Framework with Scala and we believe it was a great decision for multiple reasons.

Currently, all of our logs are stored in HDFS and S3. We use Apache Spark to aggregate raw logs into more meaningful and usable data sets. For instance, one of our Spark jobs calculates an aggregate count of Hootsuite’s unique daily users across all our different platforms and plan types.

The aggregated data sets are stored in a central RDBMS and are used for data visualizations on dashboards around the office and reports which serve multiple internal stakeholders. Since we are continuously experimenting with different data processing pipelines and data stores, we decided that it would be necessary to provide a constant and defined way to access our data through an API. Providing an API also makes it easier to clearly communicate which data sets are available and makes documentation a much easier process.

We came across multiple features that demonstrated the advantages of using Play to design and implement our API:

Productivity:

It’s very easy to get up and running once you have Play installed. All you need to do is type two commands to get an app started: “play new [AppName]” and “play run”. Done! You’re ready to be productive. Developing a prototype and experimenting with Play is painless and extremely fast. Another incredible aspect is Play’s ability to hot-reload compiled classes into the JVM on the fly without having to restart the service or re-deploy the application. This means you can edit your code and see the changes immediately with a browser refresh or cURL call.

URL-routing support:

Play provides a URL routing configuration which makes writing RESTful API endpoints simple and easy to change if needed. The configuration reflects the structure of an HTTP request – HTTP method, URL path and then the mapping to the associated controller. Below is an example endpoint from a sample routes file that will be used to demonstrate Play functionality. The activeUsers endpoint will return the number of unique daily users for a specified platform across all plan types.


#Sample Endpoint
GET /activeUsers/:platform controllers.Sample.getActiveUsers(platform, date: Option[String])

Native MVC support

Play’s full support for MVC makes decoupling the data store access logic from the rest of the service straight forward and intuitive. This means we can easily swap out our models to point to a new one if we decide to use a different data store in the future. The structure for our sample Play application looks like this:
MVC

Anorm

Anorm is a simple access layer included with Play that uses SQL to interact with databases. With the API provided you can parse and transform the resulting datasets without having to iterate over the entire result set. We use Anorm to query our aggregated data sets and found writing SQL queries to access SQL data as easy and intuitive as it sounds.
Let’s assume I have a table where the active user counts are stored by platform and plan type. I have three plan types in total: Free, Pro and Enterprise and five platform types: Hootsuite web dashboard, Hootlet, Mobile Web, IOS and Android applications. An example of what our table would look like:

Date Free Pro Enterprise Platform
2014-09-16 80000 30000 10000 Web
2014-09-16 30000 15000 8000 Hootlet

The function getPlatformIndexedDataSql below retrieves the active users count for a specified platform, and date. Since the date and platform are dynamic parameters you can declare placeholders {date} and {platform} in the query and later assign the value.

To execute the SQL query you need to create a connection to the DB you specified in your application.conf and use the Stream API to retrieve the rows from the specified table.


object ActiveUsersDao extends DataModel {
def getPlatformIndexedDataSQL(date: java.sql.Date, platform: String): PlatformIndexedData = {
val dataRow: SimpleSql[Row] = SQL(
"""
|select %s,%s,%s,platform from %s
|where
| date={date} and
| platform={platform}
""".stripMargin.format(freePlanType, proPlanType, entPlanType, dailyActivesTable))
.on('date -> date, 'platform -> platform)

DB.withConnection {
implicit connection =>
dataRow().map(row => {
platform -> Map(freePlanType -> row[Option[Long]](freePlanType),
proPlanType -> row[Option[Long]](proPlanType),
entPlanType -> row[Option[Long]](entPlanType))
}).toMap
}
}
}

trait DataModel {
type PlatformIndexedData = Map[String, Map[String, Option[Long]]]
type DateIndexedData = Map[String, PlatformIndexedData]

def getPlatformIndexedData(date: String, platform: String) : PlatformIndexedData = {
ActiveUsersDao.getPlatformIndexedDataSQL(java.sql.Date.valueOf(date), platform)
}
}

object DataModel extends DataModel {}

Custom Writes Combinators

Our endpoints response format is in JSON. The play.api.libs.json package contains data structures and methods that allow you to create, construct and convert to JSON objects. It also allows you to build custom converters by building classes that fit your data format. So for our DateIndexedData type this is what its JSON serializer would look like:


trait DataSerializers extends DataModel {

case class AllPlansResult(free: Option[Long], pro: Option[Long], ent: Option[Long])
case class PlatformAllPlansResult(platform: String, plans: JsValue)
case class DataResultByDate(date: String, count: List[JsValue])

/**
* Json writes
*/
implicit val allPlansWrites: Writes[AllPlansResult] = (
(JsPath "free").write[Option[Long]] and
(JsPath "pro").write[Option[Long]] and
(JsPath "enterprise").write[Option[Long]]
)(unlift(AllPlansResult.unapply))

implicit val platformAllPlansWrites: Writes[PlatformAllPlansResult] = (
(JsPath "platform").write[String] and
(JsPath "planData").write[JsValue]
)(unlift(PlatformAllPlansResult.unapply))

implicit val activeUsersByDateWrites: Writes[DataResultByDate] = (
(JsPath "date").write[String] and
(JsPath "activeUsers").write[List[JsValue]]
)(unlift(DataResultByDate.unapply))

def serializeDataIndexedByDate(data: DateIndexedData) : List[JsValue] = {
data.map {
case (date: String, platformData: PlatformIndexedData) =>
val userCount: List[JsValue] = platformData.keys.toList.map(platform => {
val planData = platformData(platform)
val planJson = Json.toJson(AllPlansResult(planData(freePlanType),
planData(proPlanType), planData(entPlanType)))
Json.toJson(PlatformAllPlansResult(platform, planJson))
})
Json.toJson(DataResultByDate(date, userCount))
}.toList
}
}

The benefit of these writes combinators is that we can serialize our data to JSON as easy as this:


val jsonResponse: List[JsValue] = serializeDataIndexedByDate(Map("2014-09-16" -> Map("web" -> Map(freePlanType -> Some(80000), proPlanType -> Some(30000), entPlanType -> Some(10000)))))

Unit Tests

Our unit tests for both the controllers and models extend play.api.test.PlaySpecification and use org.specs2. The controllers are tested by mocking the response of functions in our models and by creating FakeApplication to simulate a running app. Carrying on with our example, if we had the following sample controller for our example endpoint:


class SampleController(activeUsersData: DataModel) extends Controller with DataSerializers {

def getActiveUsersByPlatform(platform: String, date: Option[String]) = Action.async {
Request =>
val format: String = "YYYY-MM-dd"
val dtf = DateTimeFormat.forPattern(format)
val reqDate: String = if(!date.isDefined) dtf.print(new DateTime()) else date.get
val platformData: PlatformIndexedData = activeUsersData.getPlatformIndexedData(reqDate, platform)

val jsonResponse: List[JsValue] = serializeDataIndexedByDate(Map("2014-09-09" ->
Map("web" -> Map(freePlanType -> Some(10000L), proPlanType -> Some(5000L), entPlanType -> None))))

val futureResult: Future[List[JsValue]] = future {
serializeDataIndexedByDate(Map(reqDate -> platformData))
}

futureResult.map(result => Ok(Json.prettyPrint(Json.obj("result" -> result))))
}
}

We can now test the SampleController by mocking the DataModel it accesses. We can also mock the result of its getPlatformIndexedData function which queries our Database. This allows us to isolate the testing of the controller from the model and forgo accessing the Database in our unit tests.


@RunWith(classOf[JUnitRunner])
class SampleControllerSpec extends PlaySpecification with Mockito {

val dataModelMock = mock[DataModel]
val sampleController = new controllers.SampleController(dataModelMock)

def mockGetWebPlatformDataResult: Map[String, Map[String, Option[Long]]] = {
Map(
"web" -> Map(freePlanType -> Some(10000L), proPlanType -> Some(20000L), entPlanType -> None)
)
}

"Application" should {
"SampleController getActiveUsersByPlatform" should {
"send 200 on a faked correct response" in {
running(FakeApplication()) {
val date: String = "2014-09-04"
dataModelMock.getPlatformIndexedData(date, "web") returns mockGetWebPlatformDataResult

val result = sampleController.getActiveUsersByPlatform("web", Some(date)).apply(FakeRequest())
status(result) must be equalTo 200
contentAsString(result) must contain("web")
contentAsString(result) must contain("planData")
}
}
}
}
}

In conclusion, we found multiple benefits to building our data API with Play:

  1. Since the framework is written with Scala, we were able to implement our API with Scala. This allowed us to write concise, highly maintainable, and easy to understand code.
  2. The ability to reload classes on the fly without having to restart the service provided us with a short code to test cycle. Which, resulted in less developer time and frustration.
  3. Anorm allows us to easily and intuitively access data from our MySQL datastore.
  4. Custom write combinators provide a 1:1 relationship between data modelling and presentation.
  5. The ability to mock data models means we can easily write unit tests for our endpoints.

It is also important to note that scaling our API would only require adding additional application servers. As Play supports asynchronous requests which means requests don’t block while waiting for a result from our database to be available.

About the author: Saba is a software engineer on the DataLab team at Hootsuite, who can usually be found running a trail up some mountain. Follow Saba on Twitter: @SabaElHilo.