Friday, September 26, 2008

Scala Parser Combinators

One of the reasons I chose to do my StringArt applet as my first Scala applet was because Scala includes a nifty capability called parser combinators.

A parser is a function that converts a stream of input tokens into a data structure that can be more easily digested by the application. A combinator is a way of combing two functions to produce a composed function. A parser combinator is a way of combining two parsers to produce another parser.

The Parser classes in Scala include methods to combine Parsers in various ways and to add functions to be executed when a parse succeeds, to build the resulting datastructure.

Parser combinators simplify the creation of some parsers, including the simple expression parsing I wanted. When using Scala, the specification of the syntax to be parsed, which might normally be done in a separate file which is read by a separate parser (as when using a parser generator such as yacc), can be done directly in a Scala source file.

Scala's syntax has two features that make it possible to write a syntax specification directly in Scala that still looks a lot like a BNF description.
  1. (Almost) any operator string can be used as a method name.
  2. A method that takes a single argument can be written without parentheses.
The parser combinator class is designed to be able to generate an abstract syntax tree (AST) representing the parse, which can then be easily manipulated by the application. In the following example, we create a parser that builds an AST which we then evaluate to produce a value.

Contents

Basic Parser

We start with a simple Expr base class for our AST elements, with case classes for constant values and an add operator. The base Expr class is a sealed class, so all of its case classes, including whatever we add below, must go into the same source file as the base class.
sealed abstract class Expr { def eval():Double } case class EConst(value:Double) extends Expr { def eval():Double = value } case class EAdd(left:Expr, right:Expr) extends Expr { def eval():Double = left.eval + right.eval }
Now we create a simple parser that understands expressions consisting either of a single integer or two integers with a plus between them.
import scala.util.parsing.combinator.syntactical._ object ExprParser extends StandardTokenParsers { lexical.delimiters ++= List("+") def value = numericLit ^^ { s => EConst(s.toDouble) } def sum = value ~ "+" ~ value ^^ { case left ~ "+" ~ right => EAdd(left, right) } def expr = ( sum | value ) //top level expression def parse(s:String) = { val tokens = new lexical.Scanner(s) phrase(expr)(tokens) } def apply(s:String):Expr = { parse(s) match { case Success(tree, _) => tree case e: NoSuccess => throw new IllegalArgumentException("Bad syntax: "+s) } } //Simplify testing def test(exprstr: String) = { parse(exprstr) match { case Success(tree, _) => println("Tree: "+tree) val v = tree.eval() println("Eval: "+v) case e: NoSuccess => Console.err.println(e) } } //A main method for testing def main(args: Array[String]) = test(args(0)) }
The parse, apply, test and main methods are all helper methods so that we can run our parser and see what it is doing. So far our actual parser consists only of the value, sum and expr definitions.

In the above code, the tilda character ~ is a Parser method that composes two parsers into another parser that sequentially calls the two composed parsers, and the vertical bar character | creates a parser which calls the second parser only if the first parser fails. The double carat ^^ passes the parsed value to a function that builds the AST element. When we want to match against a parse sequence and extract more than one value, we can use a case statement after the ^^. That case pattern should match the pattern of the parsers (such as can be seen in the sum parser and its case pattern).

Strings in the middle of these sequences of parsers, such as "+" in the sum parser, are implicitly converted into keyword parsers that accept only that literal string.

You can cut and paste the above two code fragments into a file, load it into the Scala interpreter with the :load command, and test it by entering commands such as the following:
ExprParser.test("123") ExprParser.test("12 + 34")

Repeated Terms

Our parser only handles additions with two terms. In order to handle additions with three or more terms, we redefine ExprParser.sum using the repsep method:
def sum = repsep(value,"+") ^^ { a:List[Expr] => ESum(a) }
We also add the ESum case class:
case class ESum(a:List[Expr]) extends Expr { def eval():Double = a.foldLeft(0.0)(_ + _.eval) }
In order to add a subtract operator, we once again modify our ExprParser.sum method, this time using the * operator. We also add our subtract operator - to the list of delimiters:
lexical.delimiters ++= List("+", "-") ... def sum = value * ( "+" ^^^ { (a:Expr, b:Expr) => EAdd(a,b) } | "-" ^^^ { (a:Expr, b:Expr) => ESub(a,b) } )
Note that we are using a triple carat ^^^ here rather than the double carat ^^ that we used earlier. The triple carat operator does not use any data from the parser in the return value. In our use, we are returning a function which will be used by the * operator to collapse the values in the list.

We also add the ESub case class:
case class ESub(left:Expr, right:Expr) extends Expr { def eval():Double = left.eval - right.eval }
Now we can enter expressions like these:
ExprParser.test("2-1") ExprParser.test("1+2+3-4")

Precedence

Next we add multiply and divide operators. As when we added the subtract operator, we add the * and / delimiters to ExprParser.lexical.delimiters and we add case classes EMul and EDiv:
case class EMul(left:Expr, right:Expr) extends Expr { def eval():Double = left.eval * right.eval } case class EDiv(left:Expr, right:Expr) extends Expr { def eval():Double = left.eval / right.eval } class ExprParser { .... lexical.delimiters ++= List("+", "-", "*", "/") .... }
However, since multiplication and division have higher precedence than add and subtract, we can't just add those operators to the sum parser like we added subtract. Instead, we create a product parser to handle the two new operators. We insert this logically between the sum and value parsers:
class ExprParser { .... def product = value * ( "*" ^^^ { (a:Expr, b:Expr) => EMul(a,b) } | "/" ^^^ { (a:Expr, b:Expr) => EDiv(a,b) } ) def sum = product * ( ... .... }
We can now enter expressions containing multiplication as well as addition and have them properly evaluated:
ExprParser.test("3+4*5")

Parentheses

We would like to be able to use parentheses, so we add those two characters to ExprParser.lexical.delimiters. We add a parens parser, we add a term parser that accepts either a number or a value in parentheses, and we modify the product parser to call for a term rather than a value:
class ExprParser { ... lexical.delimiters ++= List("+", "-", "*", "/", "(", ")") ... def parens:Parser[Expr] = "(" ~> expr <~ ")" def term = ( value | parens ) def product = term * ... def expr = ( sum | term ) ... }
We don't need to provide a value expression for the term parser because the default value for the | operator is the value of whichever expression was parsed, which is what we want. We don't need to provide a value expression for the parens parser because the ~> operator throws away the value on the left and the <~ operator throws away the value on the right, so we are left with the value of the expr in the middle.

Now we can enter expressions with parentheses when we want to change the order of operations:
ExprParser.test("(3+4)*5")

Unary Operators

We want to be able to specify negative numbers. We can't just make a leading negative sign an optional part of our numericLit token because then we can't parse a string such as 2-1. Instead, we implement a unary minus operation with a precendence higher than any of our binary operators.

The minus character is already in our set of delimiters, so we don't need to change that list. We add a EUMinus case class, add a unaryMinus parser definition, and modify the term parser to include the unaryMinus parser:
case class EUMinus(e:Expr) extends Expr { def eval():Double = -e.eval } class ExprParser { ... def unaryMinus:Parser[EUMinus] = "-" ~> term ^^ { EUMinus(_) } def term = ( value | parens | unaryMinus) ... }
We need to specify the return type for unaryMinus because it is recursive (through the call to term), so Scala does not automatically infer its type.

Now we can parse expressions that include unary minus:
ExprParser.test("3*-(2+2)")

Lexical

The standard Scala lexical analyzer implementation of numericLit only parses integers, but we want floating point notation, so we extend the standard lexical class to do that. Note the first line of the class, where it overrides the definition of the token method to be either our own floatingToken or a token from our superclass. The rest of ExprLexical is the definition of the floatingToken token, complete with optional signed exponent.
import scala.util.parsing.combinator.lexical.StdLexical class ExprLexical extends StdLexical { override def token: Parser[Token] = floatingToken | super.token def floatingToken: Parser[Token] = rep1(digit) ~ optFraction ~ optExponent ^^ { case intPart ~ frac ~ exp => NumericLit( (intPart mkString "") :: frac :: exp :: Nil mkString "")} def chr(c:Char) = elem("", ch => ch==c ) def sign = chr('+') | chr('-') def optSign = opt(sign) ^^ { case None => "" case Some(sign) => sign } def fraction = '.' ~ rep(digit) ^^ { case dot ~ ff => dot :: (ff mkString "") :: Nil mkString "" } def optFraction = opt(fraction) ^^ { case None => "" case Some(fraction) => fraction } def exponent = (chr('e') | chr('E')) ~ optSign ~ rep1(digit) ^^ { case e ~ optSign ~ exp => e :: optSign :: (exp mkString "") :: Nil mkString "" } def optExponent = opt(exponent) ^^ { case None => "" case Some(exponent) => exponent } }
We then modify our ExprParser class to use our own lexical parser so that we can parse floats or ints:
object ExprParser extends StandardTokenParsers { override val lexical = new ExprLexical ... }
Now we can enter expressions with either integer or floating point numbers:
ExprParser.test("123.456") ExprParser.test("1.2+3") ExprParser.test("5e-1 + 2")

Precedence Revisited

If we wanted to add some more binary operators at a different level of precedence (such as the integer shift operators << and >>), we could handle that the same way as we did when we added multiply and divide, by adding a new rule and modifying the existing rules to insert it at the right location. It would be nicer if we could factor out the precedence rules so that we don't need multiple binary operator rules. Scala's parser classes do not directly support precedence rules, but we can add them relatively easily.

We replace the product and sum parsers with a binary parser and we call that instead of sum from our expr parser. We factor out the binary operations into a binaryOp method where for each precedence level we list each binary operator and its corresponding AST node. The binary parser takes an argument which is the current precedence level.
class ExprParser { ... def binaryOp(level:Int):Parser[((Expr,Expr)=>Expr)] = { level match { case 1 => "+" ^^^ { (a:Expr, b:Expr) => EAdd(a,b) } | "-" ^^^ { (a:Expr, b:Expr) => ESub(a,b) } case 2 => "*" ^^^ { (a:Expr, b:Expr) => EMul(a,b) } | "/" ^^^ { (a:Expr, b:Expr) => EDiv(a,b) } case _ => throw new RuntimeException("bad precedence level "+level) } } val minPrec = 1 val maxPrec = 2 def binary(level:Int):Parser[Expr] = if (level>maxPrec) term else binary(level+1) * binaryOp(level) def expr = ( binary(minPrec) | term ) ... }
Now if we want to add binary operators at a new precedence level, we only need to modify the binaryOp method, update the value of maxPrec, and add our case classes.

Wrap Up

Below is the complete example in its final form. You should be able to copy and paste this one code fragment and have a working four-function floating point expression parser.
sealed abstract class Expr { def eval():Double } case class EConst(value:Double) extends Expr { def eval():Double = value } case class EAdd(left:Expr, right:Expr) extends Expr { def eval():Double = left.eval + right.eval } case class ESub(left:Expr, right:Expr) extends Expr { def eval():Double = left.eval - right.eval } case class EMul(left:Expr, right:Expr) extends Expr { def eval():Double = left.eval * right.eval } case class EDiv(left:Expr, right:Expr) extends Expr { def eval():Double = left.eval / right.eval } case class EUMinus(e:Expr) extends Expr { def eval():Double = -e.eval } import scala.util.parsing.combinator.lexical.StdLexical class ExprLexical extends StdLexical { override def token: Parser[Token] = floatingToken | super.token def floatingToken: Parser[Token] = rep1(digit) ~ optFraction ~ optExponent ^^ { case intPart ~ frac ~ exp => NumericLit( (intPart mkString "") :: frac :: exp :: Nil mkString "")} def chr(c:Char) = elem("", ch => ch==c ) def sign = chr('+') | chr('-') def optSign = opt(sign) ^^ { case None => "" case Some(sign) => sign } def fraction = '.' ~ rep(digit) ^^ { case dot ~ ff => dot :: (ff mkString "") :: Nil mkString "" } def optFraction = opt(fraction) ^^ { case None => "" case Some(fraction) => fraction } def exponent = (chr('e') | chr('E')) ~ optSign ~ rep1(digit) ^^ { case e ~ optSign ~ exp => e :: optSign :: (exp mkString "") :: Nil mkString "" } def optExponent = opt(exponent) ^^ { case None => "" case Some(exponent) => exponent } } import scala.util.parsing.combinator.syntactical._ object ExprParser extends StandardTokenParsers { override val lexical = new ExprLexical lexical.delimiters ++= List("+","-","*","/","(",")") def value = numericLit ^^ { s => EConst(s.toDouble) } def parens:Parser[Expr] = "(" ~> expr <~ ")" def unaryMinus:Parser[EUMinus] = "-" ~> term ^^ { EUMinus(_) } def term = ( value | parens | unaryMinus ) def binaryOp(level:Int):Parser[((Expr,Expr)=>Expr)] = { level match { case 1 => "+" ^^^ { (a:Expr, b:Expr) => EAdd(a,b) } | "-" ^^^ { (a:Expr, b:Expr) => ESub(a,b) } case 2 => "*" ^^^ { (a:Expr, b:Expr) => EMul(a,b) } | "/" ^^^ { (a:Expr, b:Expr) => EDiv(a,b) } case _ => throw new RuntimeException("bad precedence level "+level) } } val minPrec = 1 val maxPrec = 2 def binary(level:Int):Parser[Expr] = if (level>maxPrec) term else binary(level+1) * binaryOp(level) def expr = ( binary(minPrec) | term ) def parse(s:String) = { val tokens = new lexical.Scanner(s) phrase(expr)(tokens) } def apply(s:String):Expr = { parse(s) match { case Success(tree, _) => tree case e: NoSuccess => throw new IllegalArgumentException("Bad syntax: "+s) } } def test(exprstr: String) = { parse(exprstr) match { case Success(tree, _) => println("Tree: "+tree) val v = tree.eval() println("Eval: "+v) case e: NoSuccess => Console.err.println(e) } } //A main method for testing def main(args: Array[String]) = test(args(0)) }
In my StringArt applet I also implemented variables and function calls, but those did not demonstrate any additional features of Scala's parser combinators, so I have left that as an exercise for the reader.

For additional reading check out these other parser combinator pages:
  • Debasish Ghosh blog entry in which he developers a parser for a stock (shares) trading language that looks very much like written English.
  • Daniel Spiewak blog entry in which he builds a parser for a simple language. Look in the comments for a link to an on-line version of Wadler's 1985 paper.
  • Matt Malone blog entry in which be builds a parser for the Subversion client-server message protocol syntax, using lots of RegexParsers.
  • Part 4 of Stefan Zeiger blog on parser combinators, where he creates a parser for interger expressions using * and +. The three previous parts deal more generally with parser combinators, with the promise of additional parts going into more detail about Scala parser combinators.
  • Philip Wadler's 1985 paper on parser combinators.
  • Adriaan Moors's Parsing in Scala paper. One of Adriaan's examples is a parser that handles the four basic arithmetic functions. In that example, the evaluation of the expression is done by the parser, whereas in my example I create an AST for later evaluation. I did this because in my real usage, I evaluate the expression multiple times, each time with different values for the variables which can be referenced by the expression.
  • The "Programming in Scala" book has a chapter on Parser Combinators.
  • The scaladoc page for the Parsers.Parser class in Scala lists the parser operators, and the scaladoc for the Parsers trait lists the parser method names.
  • This Parser combinator in Java was done by using parser combinators from Haskell and converting them to Java.
  • A 2001 paper about the Haskell Parsec parser.

Saturday, September 20, 2008

Creating Control Constructs in Scala

When using inner classes in Java, you can either create a named inner class or an anonymous inner class. Named inner classes are useful when you want to reference them in more than one place, and anonymous inner classes are useful for in-line single use.

Similarly, in Scala you can use either named or anonymous functions. Named functions are useful when you want to reference them in more than one place, and anonymous functions (function literals) are useful for in-line single use. In particular, anonymous functions can be used to create new control constructs that look almost identical to Scala's native control constructs.

Contents

Some Scala Syntax

For a more complete list of Scala syntax, see my Scala Syntax Primer. Below are the Scala syntax rules that contribute to the ability to make your own control constructs. You can try out these code snippets in the scala interpreter.
  • All statements are expressions. In any place where a single expression is expected, a block of expressions contained in braces can be used. The value of the block is the value of the last expression in the block. The braces also act as parentheses. The expression
    5 * { val a = 2; a + 1 }
    is valid and returns the value 15.
  • A method that takes a single argument can be written after the object instance with no dot and no parentheses. For example, a call to the String.charAt method, which is normally written like this:
    "abc".charAt(1)
    can instead be written like this:
    "abc" charAt 1
    Or, since we can use braces as parentheses, like this:
    "abc" charAt { 1 }
  • A method that takes multiple arguments can be curried, i.e. the arguments can be divided into multiple argument lists. For example, a method that takes two Strings, normally written like this:
    def cat(s1:String, s2:String) = s1+s2
    can instead be written like this:
    def cat(s1:String)(s2:String) = s1+s2
    The syntax for invoking the method matches the definition. Because we can use braces instead of parentheses, we can invoke that second definition like this:
    cat("a"){ "b" }
    or of course in place of b we could use any other expression that evaluates to a string.

Synchronized

An example of a standard Scala method that looks like a control construct is the synchronized method. In Java, the synchronized keyword is a reserved word in the language with a special syntax, but in Scala synchronized is a method on the AnyRef class that takes as its single argument a code block to be executed. Whereas in Java you would write
synchronized(myLock) { /* code to be synchronized */ }
in Scala you would write
myLock synchronized { /* code to be synchronized */ }
or, if you are synchronizing on this, you can just write
synchronized { /* code to be synchronized */ }
The synchronized method has a single argument, so by the Scala syntax rules you can write it without the dot and parentheses. In the last example, where the object is implicitly this, you do have to include either parentheses or braces around the expression or code block to be synchronized.

Your Own Control Constructs

Here is a typical pattern used in Java:
//Java code Connection conn = DriverManager.getConnection(myDbLocation); try { //SQL commands using conn sqlCommand1(conn); sqlCommand2(conn); } finally { if (conn!=null) { try { conn.close(); } catch (IOException ex) { println("Error closing database connection"); } } }
This pattern is used when you need to ensure that a resource is closed after use. The SQL commands might be just a couple of lines of in-line code that does an SQL query or update. The surrounding boilerplate code is another 10 or so lines of code, which is often significantly more than the work code. If your boilerplate also handles one or more exceptions in a standard way, it can be much larger. It would be nice to factor out this boilerplate into a method that can be reused. As a functional language, Scala makes this easy to do.

Here is a Scala function that can be used to replace the above Java pattern:
def withConnection(location:String)(body: (Connection) => Unit) { val conn = DriverManager.getConnection(location) try { body(conn) } finally { if (conn!=null) { try { conn.close() } catch { case ex:IOException => println("Error closing database connection") } } } }
The signature of body, (Connection) => Unit, says that it is a function that takes a single argument of type Connection and returns no value (Unit). We would typically call withConnection like this:
withConnection(myDbLocation) { conn => //SQL commands using conn sqlCommand1(conn) sqlCommand2(conn) }
In the above usage example, we are passing withConnection an anonymous function as the value for withConnection's body parameter. The conn => tells the compiler the name of the one argument to the function (conn), with the stuff after the => (on the following lines up until the matching close brace) being the body of the function.

Alternatively, if we have a named function that takes a Connection as an argument (such as "doSqlCommands" in this example), we can pass it instead like this:
withConnection(myDbLocation)(doSqlCommands)
Our withConnection method is designed specifically for a database connection. We could write a similar method to open a File and ensure that it gets closed, or for any resource for which we want to guarantee cleanup.

We can easily compose two such functions to create a convenience function that opens two resources and ensures both are closed. For example, assume we have the above withConnection function, and we implement a similar withFileInput function with this signature:
def withFileInput(location:String)(body: (FileInputStream) => Unit):Unit
Now we can write a withFileInputAndConnection method that opens one of each:
def withFileInputAndConnection(fileLocation:String, dbLocation:String)( body: (FileInputStream, Connection) => Unit) { withFileInput(fileLocation) { f => withConnection(dbLocation) { conn => body(f, conn) } } }
This code would be called like this:
withFileInputAndConnection(myFileLocation,myDbLocation) { file, conn => doFileCommand(file) doSqlCommand(conn) doFileAndSqlCommand(file,conn) }
Or we can pass in a named function:
withFileInputAndConnection(myFileLocation,myDbLocation)(doFileAndSqlCommand)

Refactoring

In the above examples I have suggested writing a function, but I have not mentioned where it should live. You might gather the withFileInput, withConnection and withFileInputAndConnection functions together into one utility class, but there is another approach that allows these various withSomething functions to share their structure.

Rather than defining a withConnection function, you can define a WithConnection object with an apply method that does the same thing as our previous function. You can then write the object name as if it were a function name, making it's usage look the same as any other function.

Here is WithConnection coded as an object:
object WithConnection { def apply(location:String)(body: (Connection) => Unit) { val conn = DriverManager.getConnection(location) try { body(conn) } finally { if (conn!=null) { try { conn.close() } catch { case ex:IOException => println("Error closing database connection") } } } } }
A call to this function looks identical to a call to the previous withConnection function, except with an initial upper case W character:
WithConnection(myDbLocation) { conn => //SQL commands using conn sqlCommand1(conn) sqlCommand2(conn) }
We know that the WithFileInput object would look very similar to WithConnection, so we can refactor the common code into a class that we can use as a superclass. We will call this base class WithResource. Since we now don't know the type of the resource we are dealing with, we use an abstract type R in our base class and let the subclass fill in the actual type. Likewise for the specifier of the resource, which in both of our examples has been a String but could be something else for some other resource.
abstract class WithResource { type S //the type of the resource specifier type R //the type of the resource val resourceTypeName:String def openResource(spec:S):R def closeResource(resource:R) //everything above here is abstract, subclass must define. def apply(spec:S)(body: (R) => Unit) { val resource = openResource(spec) try { body(resource) } finally { if (resource!=null) { try { closeResource(resource) } catch { case ex:Exception => println("Error closing "+resourceTypeName) } } } } }
Given the above base class, we can now more succinctly define our WithConnection and WithFileInput objects:
object WithConnection extends WithResource { type S = String type R = Connection val resourceTypeName = "database connection" def openResource(location:String) = DriverManager.getConnection(location) def closeResource(conn:Connection) = conn.close() } object WithFileInput extends WithResource { type S = String type R = FileInputStream val resourceTypeName = "FileInputStream" def openResource(location:String) = new FileInputStream(location) def closeResource(input:FileInputStream) = input.close() }
This call to WithFileInput opens /etc/fstab, prints out all the lines in it, and closes the file:
WithFileInput("/etc/fstab")( scala.io.Source.fromInputStream(_).getLines.foreach(print))
We can make our extending objects even smaller by adding type and constructor parameters to WithResource:
abstract class WithResource[S,R]( openResource: (S)=>R, closeResource: (R)=>Unit, resourceTypeName:String) { def apply(spec:S)(body: (R) => Unit) { val resource = openResource(spec) try { body(resource) } finally { if (resource!=null) { try { closeResource(resource) } catch { case ex:Exception => println("Error closing "+resourceTypeName) } } } } } import java.sql.{Connection,DriverManager} object WithConnection extends WithResource[String,Connection]( DriverManager.getConnection, _.close, "database connection") import java.io.{FileInputStream} object WithFileInput extends WithResource[String,FileInputStream]( new FileInputStream(_), _.close, "FileInputStream")
We can do a better job when creating a function to open two resources by creating a generic version and extending that generic version to get our specific version. Here we define a generic class for managing two resources, then create an object to manage two files, with an example use of that object to print out the first three lines of two files:
class WithTwoResources[S1,R1,S2,R2]( w1:WithResource[S1,R1], w2:WithResource[S2,R2]) { def apply(spec1:S1, spec2:S2)(body: (R1,R2) => Unit) { w1(spec1)(res1=> w2(spec2)(res2=> body(res1,res2))) } } object WithTwoFileInputs extends WithTwoResources( WithFileInput,WithFileInput) import scala.io.Source def test2Files = WithTwoFileInputs("/etc/passwd","/etc/group") { (in1, in2) => Source.fromInputStream(in1).getLines.take(3).foreach(print) Source.fromInputStream(in2).getLines.take(3).foreach(print) }

Object Functional

Scala is an Object Functional language: it is both an object-oriented language and a functional language, and it does a good job of integrating those two styles. Every function (when passed as a value) is an instance of a class, and as with any class, a function class can be extended to make subtypes that refine its behavior. The "apply" syntax rule (stating that an object name followed by arguments in parentheses is translated into a call to the apply method) means you can treat any class as a function class by defining the apply method, as we have done above.

Let's see how we can take advantage of Scala's integrated object-functional capabilities to improve our control class. We'll start by adding a mechanism for handling exceptions.
abstract class WithResource[S,R]( openResource: (S)=>R, closeResource: (R)=>Unit, resourceTypeName:String) { def apply(spec:S)(body: (R) => Unit) { val resource = openResource(spec) try { body(resource) } catch { case ex:Exception => handleException(ex) } finally { if (resource!=null) { try { closeResource(resource) } catch { case ex:Exception => println("Error closing "+resourceTypeName) } } } } protected def handleException(ex:Exception) { throw(ex) } }
In the above code, we have added the method handleException with a default implementation that does not change the behavior of the apply method, so subclasses which do not want to bother with implementing any special exception handling can stay as they were. A subclass which does want to implement exception handling can implement its own handleException method. Doing so in effect refines a part of the WithResource function.

Next we will add a try/catch block around the openResource call.
... val resource = try { openResource(spec) } catch { case ex:Exception => handleOpenException(ex,spec) case ex:Throwable => throw(ex) } ... protected def handleOpenException(ex:Exception, spec:S):R = { throw(ex) }
Here we are taking advantage of the fact that all statements in Scala return a value, including a try/catch statement. We make the handleOpenException method return a value of the same type as the openResource method so that a subclass can implement a method that handles the problem and returns an opened resource with which to continue processing.

Lastly we will simplify the subclasses of WithResource by making the assumption that most resources can be closed with a call to a close method.
abstract class WithResource[S,R]( openResource: (S)=>R, //we have removed the closeResource argument resourceTypeName:String) { def apply(spec:S)(body: (R) => Unit) { val resource = try { openResource(spec) } catch { case ex:Exception => handleOpenException(ex,spec) case ex:Throwable => throw(ex) } try { body(resource) } catch { case ex:Exception => handleException(ex) } finally { if (resource!=null) { try { closeResource(resource) } catch { case ex:Exception => println("Error closing "+resourceTypeName) } } } } protected def handleOpenException(ex:Exception, spec:S):R = { throw(ex) } protected def closeResource(resource:R) { type HasCloseMethod = { def close() } resource match { case r:java.io.Closeable => r.close() case r:HasCloseMethod => r.close() case _ => throw new RuntimeException("no close method available") } } protected def handleException(ex:Exception) { throw(ex) } }
In our default implementation of closeResource we check to see if the resource has a close method in two ways: firstly if it implements the standard Java Closeable interface, and secondly by Scala's ability to use structural typing ("duck typing") by seeing if it has a close method. Our WithConnection and WithFileInput objects can use our default closeResource method since each of File and Connection has a close method.
import java.sql.{Connection,DriverManager} object WithConnection extends WithResource[String,Connection]( DriverManager.getConnection, "database connection") import java.io.{FileInputStream} object WithFileInput extends WithResource[String,FileInputStream]( new FileInputStream(_), "FileInputStream")
If we want to implement a WithResource object for a resource that has some other way to close it, our extending object can implement its own closeResource method, as in our original implementation of WithFileInput above.

Conclusion

Here is what we have:
  • We have factored out all of the boilerplate in our original Java code example into our WithResource class and the objects that extend it, which we call invoke as functions.
  • We can call our WithResource function for any defined resource type with only one or two lines of code that look very similar to native control constructs.
  • We can define a WithResource function for a new resource type with only two or three lines of code.
  • We can define a function for any two resources with only one or two lines of code.
  • We can enhance the functionality of all of our WithResource functions by modifying our one WithResource base class, as exemplified by our addition of exception handling in the apply method.
We were able to do all this in Scala because of its syntax and capabilities:
  • The "apply" rule allows us to treat any object as a function.
  • The ability to pass an argument by name allows us to pass in a functional literal that looks like any other code block.
  • The ability to define a function with a curried address list allows us to split the arguments into two lists, so that we can put the last argument in a list by itself and take advantage of the following step.
  • The optional "no period and parentheses when one argument" method rule allows us to call our method in a way that looks like a built-in control construct.
  • The fact that all statements in Scala return a value allows us to use a try/catch clause and assign the result to an immutable val.
  • The ability to compare objects against a structural type ("duck typing") allows our WithResource default implementation to close any resource with a close method.

Friday, September 12, 2008

Scala Syntax Primer

Scala runs on the JVM and can directly call and be called from Java, but source compatibility was not a goal. Scala has a lot of capabilities not in Java, and to help those new features work more nicely, there are a number of differences between Java and Scala syntax that can make reading Scala code a bit of a challenge for Java programmers when first encountering Scala. This primer attempts to explain those differences. It is aimed at Java programmers, so some details about syntax which are the same as Java are omitted.

This primer is not intended to be a complete tutorial on Scala, rather it is more of a syntax reference. For a much better introduction to the language, you should buy the book Programming in Scala by Martin Odersky, Lex Spoon and Bill Venners.

Most of these syntax differences can be explained by two of Scala's major goals:
  1. Minimize verbosity.
  2. Support a functional style.
The logic behind some of these syntax rules may at first seem arbitrary, but the rules support each other surprisingly well. Hopefully by the time you finish this primer, you will have no trouble understanding code fragments like this one:
(0/:list)(_+_)
Scala is an integrated object/functional language. In the discussion below, the terms "method" and "function" are used interchangeably.

You can run the Scala interpreter and type in these examples to get a better feel for how these rules work.

Contents

Basics

  • Scala classes do not need to go into files that match the class name, as they do in Java. You can put any Scala class in any Scala file. The only time it makes a difference is when you have a class and an object of the same name and want them to be companions, in which case they must be in the same file.
  • Semicolons are optional. If you put one statement on each line, you don't need semicolons. If you want to put multiple statements on a line, you can do so by separating them with semicolons. There are specific rules about when statements can span lines, so sometimes you have to be a bit careful when doing so.
  • Every value is an object on which methods can be called. For example, 123.toString() is valid.
  • Scala includes implicit transformations that allow objects to be used in unexpected ways. If you see some source code where a method call is operating on an instance of a class which does not define that method, then probably the instance is being implicitly converted to a class on which that method is defined.
  • Scala's "Uniform Access Principle" means that variables and functions without parameters are accessed in the same way. In particular, a variable definition using the val or var keyword can be converted to a function definition simply by replacing that keyword with the def keyword. The syntax at the calling sites does not change.
  • Scala includes direct support for XML. This code fragment assigns an instance of type scala.xml.Elem to val x:
    val x = <head><title>The Title</title></head>
    You can mix Scala code in with the XML by putting it in braces. This code fragment produces the same resulting value as the above code:
    val title = "The Title" val x = <head><title>{ title }</title></head>
  • Just about everything can be nested. Packages can be nested inside packages, classes can be nested inside classes, defs can be nested inside defs.
  • As in Java, annotations are indicated by the @ character.

Keywords

  • There is no static keyword. Methods and variables that you would declare static in Java go into an object rather than a class in Scala. Objects are singletons.
  • Scala has no break or continue statements. Fortunately, Scala's support of a functional programming style reduces the need for these.
  • Access modifiers such as protected and private can include a scope in square brackets, such as private[this] or protected[MyPackage1.MyPackage2]. The default access is public.
  • The val keyword declares an immutable value (a val), similar to the final keyword in Java. The var keyword declares a mutable variable.
  • Multiple items can be imported with one import statement:
    import java.text.{DateFormat,SimpleDateFormat}
  • Imported symbols can be renamed to other names, which provides a means to work around the problem of importing two symbols of the same name. For example, if you want to import both java.util.Date and java.sql.Date and be able to use them both without having to type the whole qualified name each time, you could do this:
    import java.util.{Date=>UDate} import java.sql.{Date=>SDate}
  • If an import is renamed to _, that symbol will not be imported. This allows importing everything except a specified symbol:
    import java.util.{Date=>_,_}
  • The abstract keyword is only used for abstract classes and traits. To declare an abstract function (def), val, variable (var), or type, you omit the = character and body of the item.
  • When overriding a method in a superclass, the override modifier must be specified. Overriding a method without using the override modifier or using the modifier when not overriding a superclass method will result in compilation error.
  • Some other keywords: lazy, implicit

Symbols and Literals

  • Scala allows multi-line strings quoted with triple-quotes:
    val longString = """Line 1 Line 2 Line 3""";
  • Symbol names can include almost any character. In particular, they can include all of the characters normally used as operators, such as *, +, ~ and :. The backslash character (\) is a valid symbol character, and in fact is used as a method name in the scala.xml.Elem class. Note that "abc\"def" is a seven character String with a double quote in the middle, but if abc is an instance of scala.xml.Elem, then abc\"def" is a call passing a three character String to the backslash method, a method that accepts a String argument and returns an instance of scala.xml.NodeSeq.
  • The underscore character (_) is used as a wildcard character rather than asterisk (*), such as in an import statement or in a case statement to represent a "don't care" value. This is because asterisk is a valid symbol character in Scala.
  • As in Java, by convention class names and object names start with an upper case letter, variable names start with a lower case letter.
  • In one case, whether a symbol starts with an upper or lower case actually matters to the compiler: in a case statement, that difference is used to disambiguate between a constant value, such as PI if Math.PI has been imported (starts with upper case) and a placeholder name being introduced (whose scope is then limited to the body of the case statement).

Expressions

  • Any place an expression is expected, a block of expressions surrounded by braces can be used instead. The braces act as parentheses. For example, the expression 5 * { val a = 1; a + 2 } is valid and yields the value 15.
  • A mentioned above, symbol names can include almost any character, such as * and +. This is useful for defining methods that will be used as operators (see the rules under Function Calls for functions with one argument).
  • The precedence of operators is determined by their first character, and is hardwired to match the usual precedence. Thus if you create the operator methods *^ and +^, the *^ will have higher precedence. The one exception to this rule is that if the operator ends with an equal sign (=) and is not one of the standard relational operators, then it will have the same precedence as simple assignment.
  • When used as binary operators, any symbol which ends with a colon (:) is right-associative; all other symbols are left associative. The controlling object for right-associative operators goes on the right side of the operator, with its one argument on the left side. However, the left hand argument is still evaluated before the right hand argument.
  • The characters +, -, ! and ~ can be used as prefix operators in any class by defining the method unary_+, unary_- etc.
  • Every statement is an expression whose value is the last expression within that statement that was evaluated. For example, there is no ?: ternary operator in Scala. Instead, you use a standard if/then/else statement:
    val x = if (n>0) "positive" else "negative"
  • When used on the right hand side of a value or variable declaration, the underscore character means assign the default value. This is the same as not specifying a value in Java; in Scala, not specifying an initial value declares an abstract variable.
    val n = 123 //specific initial value var x:Int //abstract variable var y:Int = _ //default initial value

Case and Patterns

  • Like Java, Scala has a case statement to allow selecting one possible code path from many based on a value. A simple switch statement in Java might look like this:
    //Java code switch (n) { case 0: action0(); break; case 1: action1(); break; default: actionDflt(); break; }
    The equivalent code in Scala would be:
    n match { case 0 => action0() case 1 => action1() case _ => actionDflt() }
  • Instead of the switch keyword as in Java, Scala uses a match expression. The match keyword comes after the value being matched, unlike the relative positions of the switch keyword and the value in Java.
  • match works on all types, not just ints. For example, you can match on a String variable and have case statements each with a constant String value.
  • No break statement is required, and execution does not fall through to the next case.
  • match statements return values. The value of a match statement is the value of whichever branch was executed.
  • The underscore is used to indicate the default case.
  • In addition to constants, a case expression can include patterns, which allow for more complex matching. Case matching is handled by extractors, which can be implemented by writing an unapply method in an object.
  • A case pattern can include a variable declaration with a type, in which case the variable is defined with that type and set to the value of the matched data within the body of that case.
    n match { //assume n is of type Number case i:Int => //i is an Int here, like (int)n would be in Java case d:Double => //d is a Double here, like (double)n in Java case _ => //no values were defined in this case }
  • When matching a more complex expression, you can assign a variable name to an internal part of the pattern by writing the variable name and the @ character before the pattern:
    case Foo(a,b @ Bar(_)) //b gets set to the part that matches Bar(_)
  • Case expressions can be followed by a pattern guard before the =>. The pattern guard is the keyword if followed by a boolean expression:
    x match { //assume n is of type Number case Foo(a,b) if a==b => //here only when Foo with a==b case Foo(a,b) => //here for all other Foo case _ => //here for all non-Foo }
  • Case expressions work for XML. In this example, the variable b gets set to whatever is inside the body element if there is only one element there:
    case <body>{ b }</body> => //b is the contents
    To match multiple elements, use _* to match any sequence:
    case <body>{ b @ _* }</body> => //b is the contents
  • The catch block of a try/catch statement uses the same syntax as the body of a match statement:
    try { // code that might throw an exception goes here } catch { case e1:IllegalArgumentException => //e1 is IllegalArgumentException here case e2:ArrayOutOfBoundsException => //e2 is valid here case e3:RuntimeException => //any other RuntimeException comes here case _ => //all other exceptions come here }
  • A pattern can be used on the left hand side of an assignment. For example, the following code results in assigning the values 3 and 5 to the new variables x1 and y1:
    case class Point(x:Int,y:Int) //defines a simple value class val Point(x1,y1) = Point(3,5)
    This example is contrived, but the same kind of assignment works when calling a method that returns a value which is an object.
  • A pattern can be used on the left side of the <- operator in a generator in a for expression.

For Expressions

For Expressions are also called For Comprehensions.
  • A for expression consists of the for keyword, a sequence of specific kinds of elements separated by semicolons or newlines and surrounded by parentheses, and the yield keyword followed by an expression:
    for ( n <- 0 to 6 ; e = n%2; if e==0 ) yield n*n
  • The elements inside the parentheses can be any of the following:
    • A generator, such as n <- 0 to 6, which produces multiple values and assigns them to a new val (here n). The new val name appears to the left of the <- operator; to the right of that operator is a value which implements the foreach method to generate a series of values.
    • A definition, such as e = n%2, which introduces a new value by performing the specified calculation.
    • A filter, such as if e==0, which filters out the values which do not satisfy that expression.
  • The val name in a generator can instead be a pattern, similarly to how a pattern can be used in an assignment statement or a case expression. For example:
    val list = List((1,2),(3,4),(5,6)) for ( (a,b) <- list) yield a+b
    yields List(3, 7, 11).
  • The elements can be placed inside of braces rather than parentheses and separated by newlines rather than semicolons:
    for { n <- 0 to 6 e = n%2 if e==0 } yield n*n
  • When multiple generators are specified, each generator is repeated for each value produced by the preceding generator. For example, the expression
    for ( x <- 0 to 4 ; y <- 0 until 3) yield (x,y)
    produces a value starting with (0,0), (0,1), (0,2) and ending with (4,2).
  • The type of the value produced by a for expression is the same as the type of the first generator.
  • As an alternative to using yield followed by an expression, you can omit the yield keyword and use a block of code in place of a single expression.
  • A for statement can always be translated into a series of foreach, filter, map and flatMap method calls. In that sense, the for statement is syntactic sugar.

Arrays

  • Array indexes are specified with parentheses rather than square brackets.
  • Array access is implemented the same way as function access, using the apply method.
  • The code arr(index) is converted to arr.apply(index).
  • The code arr(index) = newval is converted to arr.update(index,newval).
  • Arrays are declared using the Array keyword and with the element type in square brackets, rather than using empty square brackets after a type as is done in Java. For example, an array with space for three Strings would be declared like this:
    val x = new Array[String](3)
  • A two dimensional 3 by 3 array of Strings would be declared like this:
    val x = new Array[Array[String]](3, 3).

Tuples

  • Scala has built in support for Tuples, from one element to 22 elements. A Tuple is a small ordered collection of objects, where each object can have a different type.
  • The types for Tuples of various sizes are Tuple1 through Tuple22. These types have N type parameters, where N is the Tuple size. For example, a two element Tuple with an Int and a String has type Tuple2[Int,String].
  • The Pair object allows that word to be used instead of Tuple2 for building and matching two element Tuples.
  • The Triple object allows that word to be used instead of Tuple3 for building and matching three element Tuples.
  • You can create a tuple by enclosing the object in parentheses and separating them by commas: (1, 2, "foo") is a Tuple3[Int,Int,String].
  • You can create a Tuple2 (a Pair) by using the -> operator, which works on any value: "a" -> 25 is the same as ("a", 25). The following expression is true:
    ("a",25)=="a"->25
    This is done by an implicit conversion from Any to Predef.ArrowAssoc, which contains the -> method.
  • The elements of a Tuple can be accessed as member fields _1, _2, _3 etc.
  • If an expression returns a Tuple, that can be assigned to a set of variables or vals. The following code assigns 5 to the new val tens and 8 to the new val ones.
    def div10(n:Int):Tuple2[Int,Int] = (n/10, n%10) val (tens, ones) = div10(58)
    This is a case of using a pattern on the left hand side of an assignment, as mentioned in the section on Cases and Patterns.

Classes

  • The primary constructor for a class is coded in-line in the class definition, i.e. the constructor statements are not contained within a definition inside the class. The constructor parameters are declared immediately after the class name, and superclass arguments are placed after the name of the class being extended.
  • Class parameters can be preceded by val to make them immutable instance values (vals), or by var to make them instance variables.
  • Class parameters can be preceded by an access modifier such as private or protected. By default, class parameters using val or var are public.
  • The primary constructor can be made private by adding the access modifier private before the parameter list.
  • trait is like interface in Java, but can include implementation code. Classes in Scala don't implement traits, they extend them same as classes. If a class extends multiple traits, or extends a class plus traits, the keyword with is used rather than commas as in Java.
  • Case classes are defined by adding the case keyword before the class keyword. This automatically does the equivalent of the following:
    • Prepends val to all parameters, making them immutable instance values.
    • Creates equals and hashCode methods so that instances of that class can safely be used in collections.
    • Creates a companion object of the same name with an apply method with the same args as declared for the class, to allow creation of instances without using the new keyword, and with an unapply method to allow the class name to be used as an extractor in case statements.
  • Anonymous classes can be defined without reference to an extending class, in which case they extend Object:
    val x = new { def cat(a:String, b:String) = a+b }
  • The type-parameterized isInstanceOf method is used to determine if an object is an instance of a specific class:
    if (x.isInstanceOf[Double]) ...
  • Similar to isInstanceOf, a value can be cast to a specific type by using the type-parameterized asInstanceOf method:
    if (x.isInstanceOf[Double]) { val d = x.asInstanceOf[Double] //operate on d } else { //not a double }
    However, the above construct is not typically used; instead, that functionality is implemented with a case statement, which simultaneously tests for a type and sets a new value of that type:
    x match { case d:Double => //operate on d case _ => //not a double }
  • The isInstanceOf method can be used to test if an object matches a trait as well as a class. It can also be used to test an instance against a structural definition, which can be used to test if an instance implements a specific method:
    type HasAddActionListenerMethod = { def addActionListener(a:ActionListener) } uiElement match { case c:HasAddActionListenerMethod => c.addActionListener(new ActionListener() { override def actionPerformed(ev:ActionEvent) { //insert your actionPerformed code here } }) }
  • Class literals are written classOf[MyClass] as opposed to MyClass.class as in Java.

Types

  • All values in Scala are objects, so (except for compatibility with Java) there is no int/Integer or double/Double distinction. All integers are of type Int and all doubles are of type Double. (In previous versions of Scala, either upper case Int or lower case int was accepted, but convention now is to use only the upper case version, and this may be enforced by the compiler in the future.)
  • Type specifications are written as name:type rather than type name as in Java. This is to allow the type to be omitted in many cases, since Scala does type inference. For example, write n:Int rather than int n.
  • Types for generics are specified in square brackets [T] rather than in angle brackets <T> as in Java. Thus a generic type might be specified as F[A,B,C].
  • Scala supports covariant and contravariant type specifications at the definition site. These are declared with a leading + for covariant types and a leading - for contravariant types. Thus a function declaration F[+A,-B] means F is qualified by a covariant type A and a contravariant type B.
  • Types can be specified with upper and lower bounds. The expression T<:U means type U is an upper bound for T, whereas T>:U means type U is a lower bound for type T.
  • Types can be specified with view bounds, which are similar to upper bounds: The expression T<%U means type U is a view bound for T, which allows for implicit conversion to T and can thus support more actual types.
  • A higher kinded type with two type qualifiers, such as Pair[String,Int], can be written in infix notation by placing the higher kinded type name between its two type qualifiers, such as String Pair Int. This makes more sense if the higher kinded type name happens to use operator characters such as +. Thus when you see a type such as Quantity[M + M2], as used in the Quantity class in this file, that is the same as Quantity[+[M,M2]], so look for a type called + that takes two type qualifiers.
  • Existential types are supported with an expression like this:
    T forSome { type T }
    where the contents of the braces is some type declaration. This is mainly used when interfacing to Java code that either has raw types or uses Java's ? wildcard type.
  • The type T in an existential type specification can be replaced by a more complex expression:
    List[T] forSome { type T <: Component }
    In the above example, we are saying T is some type which is a subtype of Component.
  • The shorthand
    List[_]
    is the same thing as
    List[T] forSome { type T }
  • The shorthand
    List[_ <: Component]
    is the same thing as
    List[T] forSome { type T <: Component }
  • Type variables can be defined by using the type keyword. Similar to a typedef in C, the type variable simplifies code when a complicated type is used many times:
    type ALS = Array[List[String]] val a:ALS val b:ALS
    Type variables can also be abstract, in which case they must eventually be defined by a subclass.
  • A trait may include code that accesses another trait, in which case the class that includes the first trait must also include the second trait. In order to make this work, the first trait must include a "self type" referencing the second trait. The self type declaration is the first line of the body, usually declaring a type for this, but optionally using a different name in place of this:
    trait foo { //method for foo trait } trait bar { this : foo => //methods for bar trait, which can access foo methods }
  • Inner class types can be referenced using the outer and inner class names separated by a dot (.) as in Java, or using a pound sign (#). The dot syntax specifies a path-dependent type; the pound syntax specifies the generic inner class. For example, if you had this code:
    class Outer { class Inner {} }
    then you would use Outer#Inner to refer generically to that inner class. If you had an instance x of class Outer, you would refer to the specific class Inner in that instance by using x.Inner, which is a distinct type from the Inner class within any other instance of Outer, and a subtype of the generic Outer#Inner class.

Function Definitions

  • The return type of a function is written after the function's parameter list and preceded by a semicolon, similar to the type specification for a variable. For example, a function which would be declared in Java as
    //Java code public String toString(StringBuffer buf)
    would be declared in Scala as
    def toString(buf:StringBuffer):String
  • Functions which do not return a value are declared as having the type Unit rather than void as in Java. If a function never returns (such as if it always throws a Throwable) the return type is Nothing.
  • A function with no parameters can be declared without parentheses, in which case it must be called with no parentheses. This provides support for the Uniform Access Principle, such that the caller does not know if the symbol is a variable or a function with no parameters.
  • The function body is preceded by "=" if it returns a value (i.e. the return type is something other than Unit), but the return type and the "=" can be omitted when the type is Unit (i.e. it looks like a procedure as opposed to a function).
  • Braces around the body are not required (if the body is a single expression); more precisely, the body of a function is just an expression, and any expression with multiple parts must be enclosed in braces (an expression with one part may optionally be enclosed in braces).
  • Vararg parameters are declared by appending an asterisk to the argument, like this:
    def printf(format:String, args:Any*):String
    The parameter gets turned into an array within the method, so in the above example the args parameter would have the type Array[Any] within the body of the printf function.

Function Calls

  • When a class has an apply method, foo(bar) (where foo is an instance of that class) translates to foo.apply(bar).
    • Likewise for an object. If you see Foo(bar) that is most likely a call to the apply method of object Foo.
    • As with any method, the apply method can be overloaded, with different versions having different signatures.
    • Functions are instances of a class (Function1, Function2, etc), so the same rule applies to any function object.
    • A method named unapply in an object definition is also treated specially: it is invoked as an extractor when the object name is used in a case statement pattern.
  • Functions with zero or one argument can be called without the dot and parentheses.
    • But any expression can have parentheses around it, so you can omit the dot and still use parentheses.
    • And since you can use braces anywhere you can use parentheses, you can omit the dot and put in braces, which can contain multiple statements.
  • Functions with no arguments can be called without the parentheses. For example, the length() function on String can be invoked as "abc".length rather than "abc".length(). If the function is a Scala function defined without parentheses, then the function must be called without parentheses.
  • By convention, functions with no arguments that have side effects, such as println, are called with parentheses; those without side effects are called without parentheses.

Function Sugar

"Syntactic sugar" is added syntax to make certain constructs easier or more natural to specify. The step in which the compiler replaces these constructs by their more verbose equivalents is called "desugaring".
  • Functions with one parameter (including anonymous functions) are instances of type Function1[A,B], functions with two parameters are of type Function2[A,B,C], etc. The last type in the list of parameter types is the return value type, so there is always one more than the number N of parameters. A function with no parameters is an instance of Function0[A]. The name Function with no number is equivalent to Function1.
  • (A,B)=>C is shorthand ("syntactic sugar") for Function2[A,B,C].
  • A Function1[A,B] can be written as (A)=>B, or as just A=>B.
  • A Function0[A] (i.e. a function with no parameters) can be written as ()=>A. This function can be called with or without parentheses (as mentioned in the Function Definition section).
  • A function with no parameter list can also be specified with no parentheses as =>A. This function must be called without parentheses. If you are declaring a variable x of this type, the declaration looks like x: =>A. This signature is often used for call-by-name parameters.
  • When passing an anonymous function (also called a function literal), you can use a shorthand in which you directly write the body of the function, using underscores where each of the function parameters is to go (as long as Scala has enough information to infer the type). For example, if you are folding a list to sum all the elements, you can write it the long way:
    val list = List(1,2,3,4,5) list.foldLeft(0) { (a:Int, b:Int) => a+b }
    or, by taking advantage of Scala's type inference (and using the same value for list):
    list.foldLeft(0) { (a, b) => a+b }
    or, using underscores as in-line parameter placeholders:
    list.foldLeft(0) { _ + _ }
    or using the equivalent method /: (which also does a foldLeft):
    (0/:list)(_+_)
    In this last form, we are taking advantage of the following shorthands:
    • The /: operator is equivalent to the foldLeft method (the List class defines both methods).
    • The foldLeft method (and the equivalent /: operator method) uses a curried parameter list, with the first parameter list having only one method. This allows us to take advantage of the next step.
    • Since the foldLeft method takes only one parameter (in the first parameter list), we can invoke it without the dot and parentheses.
    • Since the operator name ends with a colon, it is right-associative, so the list object goes on the right and the 0 argument goes on the left.
    • The second parameter list contains only one item (the function to apply to the fold), and the function we are passing in has only one expression, so we can use parentheses rather than braces.
    • Scala has enough information to infer the types of the two parameters in the function literal, so we do not need to specify the types of the parameters.
    • We are only using each parameter in the literal once, so we can use the underscore shorthand and not have to declare the names of the parameters in the function literal.
    • We can remove all the space without creating ambiguity.
  • If a function literal, as used in the above example, is a single method call that takes only one argument, then the method name alone may be specified. Under this rule, this:
    args.foreach( (x:Any) => println(x) )
    becomes this (the other intermediate forms given above are also valid):
    args.foreach(println)
  • Instead of using an underscore as a placeholder for an argument, if a function name is followed by a space and an underscore, the underscore is a placeholder for an entire argument list. This is a partially applied function.
  • Another example, this one from Tony Morris's Introduction to High-level Programming with Scala:
    def compose[A, B, C](f: B => C, g: A => B): A => C = (a: A) => f(g(a))
    • This defines a compose function with three type parameters A, B and C. The regular function parameters are in parentheses.
    • The parameter f is a function that takes one argument of type B and returns a value of type C. The parameter g is also a function.
    • The return type of the compose function, which appears after the colon that follows the parentheses, is a function that takes one argument of type A and returns a value of type C.
    • The body of the function appears after the = character, and is a function which has a parameter a of type A and returns the value f(g(a)).
    • Note that executing the compose function does NOT execute functions f and g, but rather returns a function object which, when invoked, will execute f(g) on its argument.
    • This would thus be used as follows:
      def plus1(n:Int):Int = n+1 def intToParenString(n:Int) = "("+n.toString+")" val plus1string = compose(intToParenString,plus1) val x = plus1string(10) //this executes plus1 and intToParenString, //sets x to the string (11)
Updated 2008-10-19: added classOf, added infix type notation.

Monday, September 1, 2008

True, Kind, Necessary

When I was growing up, my parents would occasionally remind me of what I now know as the Thumper Rule (because it was one of Thumper's lines, repeated back to his mother when she chastised him for not following it, in Disney's Bambi): "If you can't say anything nice, don't say anything at all." They would also remind me that you can almost always come up with something nice to say about someone if you think about it for a while. Yet I still sometimes thought that truth alone was a sufficient criterion for making a statement.

Later in life as I became, I hope, wiser and began to realize that there was more to communication than raw truth, I came across the rule of True, Kind, Necessary, which is a more sophisticated rule similar in spirit to the Thumper Rule. If you google for true, kind, necessary you will get a lot of hits. Although this guideline has been around for a long time (as far back as Socrates and his Triple Filter test), none of the postings I happened to read described the rule in quite the way that I originally heard it or interpret it for my personal use:
When making a statement, particularly about the person to whom you are speaking, consider whether that statement is true, kind, or necessary:
  • If the statement is true, kind and necessary, you should make it. In fact, you should go out of your way to make it.
  • If true and kind but unnecessary, or true and necessary but unkind, you may make the statement unsolicited, and should make the statement in response to a direct question if you do not have an answer that satisfies all three.
  • If true but neither kind nor necessary, you should not make the statement unsolicited, but may make the statement in answer to a direct question. However, there are other options that may be better, including demurring, responding without answering the question, or simply remaining silent.
  • If none, the statement should never be made.
The guideline as I originally heard it was actually a bit different than what I have listed above:
If a statement satisfies at least two out of three of true, kind and necessary, then it may be made.
As I mentioned in a previous post, I am a bit of a stickler for truth, so I'm not keen on the idea of making a statement that is kind, necessary and not true. But there are people who do not share my view, and for those people there is some logic in this position. Using that same logic, it would be acceptable to give an answer that satisfies any one of the three in response to a direct question. Perhaps the canonical example would be answering "no" to the question "Does this make me look fat?" when that's not your actual opinion. If you believe that "little white lies" are acceptable, this might be the guideline that helps distinguish between those "white lies" and lies of other colors.

Or perhaps you should say nothing, if your words would not improve the silence.