## Scala at Hootsuite
scala_hootsuite_04-150x111

Some of our Scala engineers have a Java/PHP background, while others have a background in Haskell. After writing Scala for a few years, we feel that we have (more or less) figured out a happy medium of how we should be writing Scala with a [functional](#trivia) style we can all agree on.

This blog post will show code examples we’ve come across while refactoring as part of our code review / pair programming sessions. While this is production-level code, but I have simplified it to demonstrate the changes.


## List.sum (or TraversableOnce.sum)

This is the most trivial, but still worth talking about.

There are many useful built-in functionalities in Scala, particularly amongst the Collection classes. Everyone should read all the names/types of `List` methods, at the very least. I feel like they are shiny gems in a treasure box.

`sum` is a good example of this. And it’s not only good because of its functionality, but also good because it *encourages* you to convert your sequence to a simple number sequence first. That makes code easier to read and debug.

Before

[cc lang=”scala”]
hashmaps.foldLeft(0) { (acc, hashmap) => acc + hashmap.getOrElse(“abc”, 0) }
[/cc]

After

[cc lang=”scala”]
hashmaps.map(_.getOrElse(“abc”, 0)).sum
[/cc]

First of all, shorter is better. Also, not only does the word “sum” sound more natural (and easier to understand) than the combination of `foldLeft` and `+`, but also the separation of steps makes each step clearer. Doing one thing at a time makes code easier to read.

## Future.sequence

We use [Future](http://doc.akka.io/docs/akka/snapshot/scala/futures.html) for almost every Scala project, either as `scala.concurrent.Future` for newer projects or as `akka.dispatch.Future` for older projects.

We had an issue with a tool we used internally as it responded very slowly. It calls a Web API and gathers the result.

[cc lang=”scala”]
// getIsServerAaaFine(): Future[Boolean]
// getIsServerBbbFine(): Future[Boolean]
// getIsServerCccFine(): Future[Boolean]
// … each of them takes time to complete.
val result = for {
a <- getIsServerAaaFine()
b <- getIsServerBbbFine()
c <- getIsServerCccFine()
} yield a && b && c
[/cc]

It's slow because these future calls are triggered sequentially. `getIsServerBbbFine()` starts after `getIsServerAaaFine()` completes.

Here's how we refactored it to make them run asynchronously:

[cc lang="scala"]
val aF = getIsServerAaaFine(); val bF = getIsServerBbbFine(); val cF = getIsServerCccFine()
val result = for {
a <- aF
b <- bF
c a && b && c }
[/cc]

If the code above work, it looks amazingly concise. And yes, it works, with using `Future.sequence`

[cc lang=”scala”]
import scala.concurrent.Future
import scala.concurrent.ExecutionContext.Implicits.global

Future.sequence(List(getIsServerAaaFine(), getIsServerBbbFine(), getIsServerCccFine())).
map { case List(a, b, c) => a && b && c }
[/cc]

In this specific case you can also use `reduce` or `forall` to simplify.

[cc lang=”scala”]
import scala.concurrent.Future
import scala.concurrent.ExecutionContext.Implicits.global

// reduce
Future.sequence(List(getIsServerAaaFine(), getIsServerBbbFine(), getIsServerCccFine())).
map { bools => bools.reduce(_ && _) }

// forall
Future.sequence(List(getIsServerAaaFine(), getIsServerBbbFine(), getIsServerCccFine())).
map(_.forall(identity))
[/cc]

Challenge: define your own `Future.sequence` in Scala within 5 minutes.

## Immutable Var vs Mutable Val?

[cc lang=”scala”]
import scala.collection.mutable
class A {
val m = mutable.Map[Int, Int]()

m.update(1, 10)
}
[/cc]

VS

[cc lang=”scala”]
class A {
var m = Map[Int, Int]()

m += (1 -> 10)
}
[/cc]

We prefer the former! — the Immutable `var`

Two important Scala facts to note: (1) Scala encourages you to use immutable objects. Most data structures in Scala are readily available without importing any external libraries. Built-in data structures, such as `List()` or `Map()`, are immutable, and if you want to use mutable data structures, you likely have to import it explicitly. (2) Scala encourages you to use `val` instead of `var`. `val` is nice, because you don’t have to track which value the name currently refers to!

So when you have to introduce some state in your code, which do you choose? `var + immutable obj` or `val + mutable obj`? Note that `var + mutable obj` is also a choice, but it shouldn’t be used since we want to mitigate mutability as much as possible.

On our team, we prefer `var + immutable obj` both for code readability and safety. The exception to this rule is that performance profiling tells us the `val + mutable obj` version runs significantly faster / uses significantly less memory, but of course it strongly depends on each case.

Why?

Reason 1: the way to modify the underlying value in a mutable object varies, but the way to modify the reference of `var` *always* requires `=`, so it’s easy to find.

Reason 2: a mutable object can modify its value everywhere, even in another function that got the val passed in as an argument. Immutable `var` can only change its reference by the part that has the reference to `var` itself.

It’s not easy to explain in plain English, but I hope the following code example explains it more clearly:

[cc lang=”scala”]
import scala.collection.mutable

object HelloWorld {
def heyThisFunctionIsSafe1(m: mutable.Map[Int, Int]) {
m.update(10, 20)
}

// def heyThisFunctionIsSafe2(m: Map[Int, Int]) {
// m += (3 -> 4)
// }

def main(args: Array[String]) {
val m1 = mutable.Map[Int, Int]()
var m2 = Map[Int, Int]()

m1.update(1, 2)
m2 += (3 -> 4)

println(‘first, m1, m2)
heyThisFunctionIsSafe1(m1)
// heyThisFunctionIsSafe2(m2)

println(‘second, m1, m2)
}
}
[/cc]

Regarding performance, a mutable `val` may be fast when you have a large amount of data and are modifying it often, and immutable `var` may be fast and use less memory when you have lots of them. For example, you could have a million mutable lists whose contents is empty, and you make a million references to a million independent objects, whereas a million immutable empty lists still requires a million references but has just 1 shared object. However, performance isn’t free from its context. You should ignore this performance drawback until you find something concrete that verifies that this is the bottleneck of your system. Code that works slowly is infinite times better than the code (incorrectly written) that works very quickly; readable code reduces potential bugs in future.

## Trivia

Here’s a list of other random thoughts (and generally, concepts that have helped us with this refactoring exercise):

Note: “We” below means my team, and “I” means [the author of this post](http://twitter.com/ujm).

* We use [GitLab](https://www.gitlab.com/) and use its Merge Request feature for a Code review. Imagine GitHub’s pull request if you aren’t familiar with GitLab.
* We prohibit use of `Thread.sleep` in production code for performance but use Akka’s Scheduler instead. (Note: Stick with Thread.sleep in test code only.)
* Here’s sample code using Akka’s Scheduler instead of Thread.sleep:
* I personally try not to use the word “functional” at all when I talk about style / paradigm. It’s a buzzword unfortunately — it’s not commonly well-defined, so it can mean anything depending on context. I try choosing more precise words for each specific case, like “shorter” or “declarative”

## Appendix

Author

* uji
*
*