Sunday, January 18, 2009

You Should Learn Scala

You should learn Scala. Okay, I can be a bit more precise than that: If you consider yourself a Java programmer, then you should learn Scala.

Perhaps you ask, Why? If I were being selfish I might answer, Because I want more people to use Scala so that it becomes easier for me to use it where I work. Or if I were being arrogant about the language I might say, So that you will be ready for it's inevitable rise as the successor to Java. But instead I will give an answer that I hope will have a stronger and more immediate appeal: Because it will make you a better Java programmer.

Learning Scala will not necessarily make everybody who knows Java a better programmer. If you are a Java expert and you also happen to know Haskell, ML and Erlang inside out, then perhaps Scala does not hold much new for you. But if you do know those languages, you probably consider yourself something a little different than "a Java programmer", which if you notice is the phrase I used above.

Contents

Why

What's wrong with just knowing Java? Some people claim learning Java stunts your intellectual growth as a developer. Here are just a few negative comments about Java: If you want to see some more things that people don't like about Java, google for "Java considered harmful".

It is common advice that you should learn multiple programming languages. Some people say you should learn a new language every year. Each language will have at least some little corners that will present you with new concepts. You should make a point to learn other languages that have deep roots in Computer Science to ensure that at least some of those new concepts are substantial.

Why learn Scala rather than Haskell or something else? Scala integrates functional programming with object-oriented programming. When coming from the object-oriented Java world, Scala allows you to gradually learn functional techniques while still being able to use familiar object-oriented techniques. For a Java programmer, learning Scala may be easier than learning other functional languages that are not object-oriented.

Scala also has the advantage of running on the JVM and allows you easily to make direct calls to Java code. There are other languages that run on the JVM and can call Java, but none that do so as easily as Scala and that integrate the functional and object-oriented approaches as well. This means you can immediately start using Scala code along with the rest of your Java code, and you can leverage your knowledge of all those Java libraries.

Dr. Dobbs, in a journal entry about learning Scala if you use Java, says "[Scala is] the Java route to [Functional Programming]". He also points out that there is an Eclipse plugin for Scala, so you can continue to use Eclipse. Some other Java tools also work with Scala. I happen to like the jswat debugger and have used it on my Scala programs.

Some things you will learn from Scala:
  • The importance of immutable values.
  • The simpler composition of functions/methods that have no side effects.
  • The Actor model for concurrent processing.
  • How to think using higher order functions.
  • A better understanding of variance (covariance and contravariance).
We will examine each of these in more detail to see how Scala encourages these programming habits.

But first, an overview of Scala.

What

Scala is a combined object-oriented and functional language. It was created by Martin Odersky, a computer scientist at EPFL in Lausanne, Switzerland. Odersky codesigned and implemented (with Philip Wadler) the Pizza and GJ (Generic Java) extensions to Java, after which he was hired by Sun to work on the Java compiler, to which he added generics.

Scala was originally intended to run on both the Java Virtual Machine (JVM) and the dotNET Common Language Runtime (CLR). Unfortunately, the dotNET implementation seems to have fallen by the wayside, so Scala has de-facto become a JVM-only language.

As of this writing, Scala has been around for over five years, has been relatively stable for over a year, and is now at version 2.7.3.

With that little bit of background about Scala out of the way, let's get back to those Things You Will Learn.

Immutable Values

Using immutable values makes it easier to write code without side effects, reduces the likelihood of concurrency bugs, and can make code easier to read and understand. Scala separates the concept of a val from a var. A val in Scala is like a final variable in Java: once a value has been set, it can not be changed. In Java, you have to add the final keyword to a variable declaration to make it immutable. In Scala, you have to use either var or val. This forces you to think about that choice, and since it is just as easy to type val as var, there is little reason not to do so if you don't think the value should change. Sure, you can just add final in Java, but the language does not encourage you to think about that detail, and the default is for everything to be mutable.

Referential Transparency (No Side Effects)

To take a paragraph from one of my previous posts:
Referential transparency is a phrase from the functional programming world which means, basically, "no side effects". Side effects include reading any state which is not passed in as an argument or setting any state which is not part of what is passed back as an argument. If a function is referentially transparent, then a call to that function with a specific set of values as arguments will always return exactly the same value.
Functions with side effects are harder to test, harder to reason about, and in general harder to get right. As you compose functions with side effects, the side effects tend to accumulate, making the composed function even more difficult to get right.

In imperative languages such as Java the natural way of writing many functions is to use variables (mutable data) and loops. In functional languages there are other ways things are more typically done, including the use of recursion and higher order functions, that don't require the use of mutable variables.

In object-oriented languages objects may contain state (mutable instance data) as well as data. When a method uses mutable instance data, it now has side effects, with all of the additional considerations that requires.

It is almost inherent in the nature of object oriented languages to encourage the use of instance state data. But because Scala has one foot in the functional language community, if you learn Scala you will also be drawn into that community and will learn some techniques for writing code without side effects and the advantages of doing so.

Actor Concurrency

Writing correct concurrent code is hard. Java made it easier to do by introducing monitors and building thread control and synchronization into the language. (As a commentary on how hard it is to get concurrency right, even in the original Java Language Specification they did not quite get it right, and had to redefine the memory model with JSR-133 for Java 1.5).

With Java's threads and the synchronized keyword it is easier to write code that doesn't corrupt data due to simultaneous access by multiple threads, as long as you are careful to synchronize all access to shared data.

The monitors that are used for synchronization are external to the methods that lock on them, which means any method that uses synchronized is not referentially transparent (it has the side effect of locking the monitor, which is visible outside the function while the function is running), which in turn implies that functions that use synchronized are harder to compose.

In fact, this is precisely the case: the more functions you compose that use synchronized, the more likely you are to run into a deadlock problem, which is an undesired interaction between those side-effects of the functions. The Java thread/monitor model works well enough for a small number of threads dealing with a very small number of shared objects, but it is very difficult to manage a large program with many simultaneous threads accessing multiple shared objects.

Scala supports the Java approach to concurrency using threads and synchronized, but it also provides another model for concurrency that scales up much better: the Actor model. Actors are a message-passing concurrency mechanism borrowed from Erlang, a language designed for high concurrency.

An Actor is an object that is responsible for maintaining data (or access to any other resource) that needs to be shared by multiple threads. The Actor is the only object allowed to access that data (or resource). Other threads communicate with the Actor by sending messages to it; the Actor can respond by sending messages back (if the other thread is an Actor). Typically the messages are immutable. Scala does not enforce this, but using mutable messages makes it more difficult to scale. Each Actor has a message inbox where incoming messages are queued. Scala's Actor library handles all of the message transfers, so the programmer does not have to deal with synchronizing any code.

There are many levels of possible problems that can arise with concurrent programs:
  • Data corruption due to concurrent access.
  • Deadlock.
  • Resource bottleneck or starvation.
Java's support of threads and synchronized makes it easier to write concurrent code that does not suffer from data corruption due to multiple concurrent access, but we still have to worry about deadlock. Scala's Actor library helps get past the next level: the Actor model can support huge numbers of active actors. all with shared access to a very large number of shared resources, without deadlock, allowing the programmer to focus on ensuring that the higher level issues such as resource bottlenecks will not be a problem. According to Haller's paper (page 14), he was able to run 1,200,000 simultaneously active Actors, whereas the equivalent test using threads on the same hardware ran out of memory and was unable to create 5500 threads.

I have heard that there is an Actor library for Java called Kilim, but I have not tried it.

Higher Order Functions

This is really what functional programming is all about. In a functional language, functions are first-class objects that can be assigned to variables and passed to other functions, the same as any other data type. This allows for a style of factoring that sometimes allows code to be written much more concisely, which (assuming you understand the whole concept of passing functions around as objects) often also makes the code easier to understand.

You can do something like passing a function around in Java by defining an interface with a named method, passing an object that implements that interface, and invoking it by using the method name. While this sort of works, it requires an annoying amount of boilerplate and it doesn't necessarily make the resulting code easier to read.

Scala provides a set of classes called Function0, Function1, Function2, etc., and a bunch of special compiler syntax so that you can write relatively concise functional code, which the compiler then translates into the appropriate classes, instances and method calls to make it all work in the Java VM. The code is not quite as concise as in some other functional languages, because of limitations due to how the type system works (object-oriented type hierarchies and global type inference don't mix very well), but it's much more concise than the equivalent Java code.

Variance

Variance has to do with higher-order types, such as List<String> in Java or List[String] in Scala.

Before generics were introduced in Java, there was no type information for higher-order types, except for arrays, so there was no way to do anything about covariance or contravariance. With the addition of generics to Java, covariance and contravariance checks became possible. Unfortunately, because of Java's legacy of having started off without the higher-order type information, the generics definition has a few problems that can make the whole concept a bit harder to understand.

In Scala, variance was designed in from early on so the whole thing is cleaner. It does admittedly have some problems: although cleaner than Java, it's not as clean as the pure functional languages like Haskell; it has it's own share of odd corner cases (although they are much further into the corners than in Java); and, because Scala has to run on the JVM, it has the same limitations as Java relating to the lack of runtime information about higher order types (type erasure).

Java arrays are broken in terms of variance. After learning about variance you will understand why you can't safely cast String[] to Object[] in Java.

How

How can these lessons be applied to Java? Here's a brief list of some things you can do.
  • immutable values: use "final" more.
  • no side effects: write more pure functions (no mutable variables) write methods not to use global or instance state, use more recursion.
  • Actor model: check out Kilim.
  • higher order functions: you can use interfaces, although it is not nearly as convenient. Maybe you will be able to adopt a more functional style in Java if one of the closure proposals gets implemented. In which case after learning Scala you'll be ahead of the game in Java because you will already know how to use higher order functions effectively.
If you learn Scala and use it for a while, you will probably become more comfortable with these approaches.

Perhaps, after reading all of the above, you have decided that you should learn Scala. Great! How can you go about that? My basic advice:
  • Read about Scala: articles, blogs, books, newsgroups.
  • Write some code. As soon as you can, and as much as you can. Applets, programs, libraries, anything. There is no substitute for writing code.
  • Run the Scala interpreter and type things in.
You can download the Scala compiler and other stuff from the official Scala web site.

Here are some pointers to some things you can read to get you started: Don't spend too much time reading before you begin coding. Pick a project and get started writing Scala!

Updated 2009-01-18: Fixed var/val typo as pointed out by Doug.

10 comments:

Doug Holton said...

"A var in Scala is like a final variable in Java: once a value has been set, it can not be changed"

You mean val not var

"it is just as easy to type val as var"

which is exactly why one of the two should be dropped from scala, or changed to something more distinct.

Probably the simplest solution would be to just add 'final' again, or else use combine it with a non-null ! symbol:

MyClass! m = ...

or else make immutable/non-nullable the default and have a symbol for the reverse:

def mymethod( param1 : MyClass?)

dr said...

Jim. Thanks for all your hard work promoting Scala. I've found your articles, especially the syntax and operator primers, invaluable.

James Iry said...

@Doug,

You are free to drop var from your programming and borrow a page from ML with a Ref[T] class for mutable variables.

scala> final case class Ref[T](var value : T) {
| def ! = value
| def set(x : T) = value = x
| }
defined class Ref

scala> val x = Ref(3)
x: Ref[Int] = Ref(3)

scala> x!
res3: Int = 3

scala> x set 4

scala> x!
res5: Int = 4

Scala tries to strike a narrow balance: on one hand it strives to make it easy to write functional code. On the other it strives to make be a comfortable language for imperative programmers, especially those from a Java background. The val/var distinction meets those goals quite admirably.

Jim McBeath said...

Doug: An ironic typo. If I had been writing real Scala code using a syntax-coloring editor, I expect I would not have made that mistake. I have updated the article to fix that error.

If we had to get rid of one of val or var, I would rather keep val the simple one and make var more verbose. But I don't think that's necessary. The difference between + and - or * and / is also a single character, but I doubt many people would argue that one of them should be dropped or changed for that reason.

dr: You're welcome, thanks for your encouragement.

Eelco Hillenius said...

Thanks for the articles Jim

Satya said...

I was looking for a Scala quick start tutorial. Do you know one?
Thanks

Jim McBeath said...

Satya: Take a look at the links in the list of bullets at the end of my post, in particular the second bullet.

matias said...

thanks for the many posts, Jim. I was curious about that Dewar Schonberg article linked from your post that results in a 404.

i tracked it down to here: http://www.crosstalkonline.org/storage/issue-archives/2008/200801/200801-Dewar.pdf

amethod said...

If going to functional programming while maintaining Java functionality and compatibility was really the ultimate goal, wouldn't Clojure be a better option?

Jim McBeath said...

amethod: I wanted to focus on comparing Java to Scala rather than comparing Scala to other JVM languages. As I state in my post, "Scala allows you to gradually learn functional techniques while still being able to use familiar object-oriented techniques." I think moving from Java to Scala is enough of a jump for most Java programmers; convincing those programmers to jump straight to Clojure would be a more difficult task, which I will leave to someone else.