Thursday, October 1, 2009

Initializing Immutable Variables in Scala

One of the guidelines I picked up when I learned Scala is to use immutable variables as much as possible. Besides the trivial but satisfying detail of making the declaration of an immutable variable (val) take no more characters than a mutable one (var), Scala also provides some interesting ways to set the values into those immutable variables.

In Scala, immutable variables are identified by declaring them using the val keyword rather than var. In Java, immutable variables are identified by adding the final qualifier to the variable declaration. But a Java final variable has slightly different semantics than a Scala val: in Java, you can declare a final variable without specifying a value for it, then fill in the value later. Java allows the variable to be assigned once, after which it can not be assigned again. In Scala, a concrete val must have its value assigned as part of the definition.

Consider this sample Java class, Interval, which represents an interval on the real number line. We want to allow the constructor to be called with endpoints in either order, but we want to store them internally in sorted order.
//Java code public class Interval { final double start; final double end; //invariant: end>=start public Interval(double x1, double x2) { if (x1>x2) { start = x2; end = x1; } else { start = x1; end = x2; } } //other methods that use start and end go here }
If you try this idiom in Scala, by replacing each final variable with a val but continuing to use the same initialization construct, you will get a compiler error "reassignment to val". When using a concrete val in Scala, you must supply the value in the statement where you declare the val.

For relatively simple cases, as in this example, we can take advantage of the fact that Scala allows us to build expressions with if in them, so we can express the same functionality as in the above Java code as follows:
class Interval(x1:Double, x2:Double) { val start = if (x2>x1) x1 else x2 val end = if (x2>x1) x2 else x1 //other methods that use start and end go here }
Sometimes the logic to calculate the values for the immutable variables is much more complicated than this and more expensive to calculate. Perhaps, as in our Java example, we don't want to recalculate that condition over again for each variable. We might also be more comfortable building up our values using mutable variables. We could take the easy and straightforward way and just use var rather than val for our variables, but it is worth a bit of effort to retain the immutability of our variables. Here is an approach I sometimes take:
class Interval(x1:Double, x2:Double) { val (start, end) = { def intervalNeedsReversing(a:Double,b:Double) = (a>b) if (intervalNeedsReversing(x1,x2)) (x2, x1) else (x1, x2) } //other methods that use start and end go here }
In the above approach, we have a block of code that calculates our values. Though not needed in this case, the intervalNeedsReversing function is an example of how you you can define functions within a block in order to refactor that code or better organize it. The value of the block is a tuple, which we then assign using a tuple-assignment to our immutable variables start and end.

A tuple-assignment is a pattern-matching operation that pulls apart the tuple data and stores each piece into the separate variables. It looks like the second line in this example:
val t2 = (123, "abc") //the type of t2 is Tuple2[Int, java.lang.String] val (n, s) = t2 //assigns n=123, s="abc"
You can use any expression in place of t2 that has the same type, including a function call, a variable, a literal tuple, or a code block.

You can include a type on each variable name; if the types of the assigned variables don't match the corresponding types of the value on the right hand side, you will get a compiler error.
val (n:Int, s:String) = t2 //ok val (s:String, n:Int) = t2 //error
The tuple syntax of parentheses around a comma-separated list of values is actually a shorthand for the TupleN class. For each pair of lines below, the first line is a shorthand ("syntactic sugar") for the second.
(a, b) Tuple2(a, b) (1, "x", "y") Tuple3(1, "x", "y") val (n, s) = t2 val Tuple2(n, s) = t2
The last of the three examples above is a pattern-matching assignment statement.

You can use the List pattern in an assignment as well:
val a :: b :: c = List(1,2,3,4) //This assigns a:Int=1, b:Int=2, c:List[Int]=List(3,4)
The List and Tuple classes can be used in a pattern-matching assignment like this because they each have an extractor defined by the unapply method in their companion object. You can use any extractor (that is, any declared object that includes an unapply method) in this way. For example, a case class can be used:
case class Foo(num:Int, str:String) val f = Foo(42,"ok") val Foo(n,s) = f //assigns n:Int=42, s:String="ok"
This works even if the case class happens to use mutable fields: the values at the time of the pattern match assignment are set into the new variables, which are immutable.
case class Bar(var num:Int, var str:String) val b = Bar(42,"ok") b.num += 1 b.str = "no" val Bar(n,s) = b //assigns n:Int=43, s:String="no" b.num += 1 //does not change n
For example, if you have a large number of values to set at once, you could declare a case class to represent them, and match on that to assign the values:
class AnotherExample { case class MyArgs(var name:String, var pathPart:String, var someNumber:Int) val MyArgs(path, part, num) = { val m = MyArgs("/path/foo/bar", "partX", 123) //change values of fields in m as desired m } }
You thus get the benefit of having immutable variables for use in your constructed object, but you can use mutable private data within the block to make it easier to do your construction.

You can use this technique to initialize immutable variables within a method as well. Effectively, you are using mutable variables only for the limited scope in which they are desired. By enclosing them in a block you prevent code outside that block from modifying those mutable values.

Since this technique is based on pattern matching, you can use it with any legal pattern. Pattern matching is typically used in the case clauses of match statements.

Patterns can include nested constructs, which allows you to pull out values from deep within a structure when that structure is known. By using the @ operator within a pattern you can extract the value of an entire subpattern:
case class Foo(n:Int, var s:String) case class Baz(f:Foo, b:Option[Baz]) val data = 123 :: Baz(Foo(3,"c"),Some(Baz(Foo(4,"d"),None))) :: 456 :: Nil val _ :: Baz(Foo(_,a),Some(b @ Baz(c @ Foo(d,e),_))) :: f :: _ = data // The above val statement assigns these values: // a = "c" // b = Baz(Foo(4,"d"),None) // c = Foo(4,"d") // d = 4 // e = "d" // f = 456
The undersccore indicates a placeholder for a part of the pattern whose value we don't care about and don't want assigned to anything.

Note that the variable c refers to the same object as the Foo object that appears in variable b. We defined Foo with a var for s. If we change the value of the Foo object referenced by variable c, then we will see that change when we ask for the value of variable b:
scala> b res0: Baz = Baz(Foo(4,d),None) scala> c res1: Foo = Foo(4,d) scala> c.s = "x" scala> c res2: Foo = Foo(4,x) scala> b res3: Baz = Baz(Foo(4,x),None)
Although b and c are themselves immutable variables, if they point to the same mutable object then changes made to that object through one variable will be visible through the other variable.

As you learn Scala and see examples of case statements, remember that any syntax that is valid as the pattern match in a case statement is also valid as a pattern match in a val assignment.

3 comments:

Dani said...

Very interesting post. I see that I can set more immutable variables in my code in Scala. Thanks a lot!

Dean Wampler said...

I second your argument that you shouldn't let complicated initialization logic deter you from using vals. ;) One way you could simplify some of the examples, e.g., the Interval logic, is to extract the logic into a private method that returns the correct value to the declaration.

Sandeep Bhandari said...

For those who are new to final variables, check how final variables behave in Java and then use this post for comparison between the two.