Thursday, December 3, 2009

Improve Your Releases

There is more to a good software release than a good program.

If you are releasing software and want it to be successful, you have to do more than just write a good program. You need to consider all of the things the user will want to do with your software before and after actually using it.

You can look at the steps below as an interpretation of the phases of the software lifecycle from the perspective of the user. Depending on how well you do your job, the user will have a better or worse experience with each of these phases. If, for one of these phases, you do nothing, the user is likely to have an unpleasant experience when he gets to that phase.

Develop

This is the phase that most open-source developers focus on. If we were looking at the software life cycle in a little more detail, we would split this phase into three separate phases: design, code, and test. In commercial development these three phases are often handled by separate groups, but from the user's perspective they can be lumped together as being the factors that contribute to the overall quality and usability of the software.

This is the time in which to consider all of the points below so that you can create your system in a way that makes it easy to do the right thing for all of the other phases of the software life cycle.

Release

Once the software comes out of test, it must be packaged up as a Release. It is useful for a user easily to be able to tell what released artifact he has acquired and what version of that artifact he has. The simplest way to handle this is to define a released artifact as being a single file. If you think you need to release a collection of files, they should be packaged up as a single file, such as in zip, tgz (tar-gzip), dmg or iso format. You can then give each file a name and version number to allow the user to identify it.

You may have a product that is composed of a number of other released artifacts. You can bundle these into one larger artifact that is a collection of the other artifacts plus an installer that can invoke the installers of the other artifacts, or that knows how to install those other artifacts directly. Operating system installers work in this way.

Once your released artifact is in a single file and appropriately labeled, it is easy to take the next step: generate and publish a checksum for that file. If, for any reason, the user is unsure about what artifact and version he has, he can then run a checksum on his file and compare it against your official list of checksums. Using a cryptographic checksum provides protection not only against accidental corruption, but against intentional modification (hacking) of the artifact as well. Depending on the level of security desired, you can use md5, sha1, or sha256 for your checksum. Operating system distributions such as Fedora do this, including the additional security step that the published list of checksum values is digitally signed.

Distribute

Your user needs to get your released artifacts. Long ago this used to be done by distributing physical media such as DVDs, CDs, floppies, or tapes. Today most distribution is done over the internet, which makes this step far simpler than it used to be.

Open source projects have easy solutions available through such services as sourceforge and github. Many commercial providers also distribute their software from download pages on their web sites, often with additional security such as restricting web access to customers with accounts, and using license files to enable specific functionality in the installed software.

Given how widespread and well-understood this model is, it makes sense to use it for internal software as well: set up a web site where your users can find all of your released artifacts. If the list of artifacts is small, you can just set up a few directories with files in them and serve up those files with your web server. When the number of artifacts gets large enough so that browsing listings gets cumbersome, you can add a search form. If you need to restrict which of your internal users have access to your downloads, you can do that in the same way as commercial vendors do, with password access or license-file control of the installed application.

Install

The user should be able to install the complete application from the downloaded artifact with a single command, or at most two commands (an unpack command followed by execution of a setup script). Installing a Windows application is generally done by downloading an exe file and then executing it; a Mac install is generally done by downloading a dmg file, double clicking on it, and dragging the app into another folder; a Java install is often done by downloading a jar file and then executing it (such as by running java -jar on it). These are all examples of simple installation mechanisms. Once running, an installer can direct the user to select values for options and installation paths.

In particular, you should not require the user to unpack the software and then manually execute a number of other steps such as moving files around or editing config files. These are steps that should be handled by an installer.

When the files are installed on the user's system, there should be an easy way to determine what version is installed and in use. This information should be easily available in the application, such as in an About menu in a GUI application. If a user might install multiple artifacts from your collection, there should be a simple way to get a list of all artifacts installed and in use along with their version numbers so that you can unambiguously tell what versions of your software are being used at that site.

Support

Finally, the user is using your software. If your software is well designed, well written and well tested, the user should have no problems using it and all will be well. In reality, it is unlikely that no user will ever have any problems with your software. When a user does have problems, what will he do (other than grumble or swear at your software, that is)? Assuming the user is motivated to solve the problem rather than just giving up, he will seek out resources that can provide him with the information he needs to solve his problem. You can make his life easier in this step by providing some or all of the following:
  • A Users Guide or set of guides (tutorial, reference).
  • In-application help (context-specific, page-specific, links to the manual, search, how-to).
  • On-line forums where users can share their problems and solutions.
  • Direct support, via telephone, email, or chat.
If a user experiences a crash or runs into a bug, it might be nice if he can easily submit a crash report or a bug report so that you can more effectively fix the problem. If so, you will want that report to automatically include the list of installed artifacts and versions, as discussed above in the Install section.

Upgrade

If your software is successful you will probably release new versions of it. A user who is already using your software should be able to start using the new version of your software with minimal hassle. As with the initial install, installing an upgrade should be done with at most one or two commands, as it could be with an upgrader that guides the user through whatever questions need to be answered for the upgrade.

You can add an option to your application to check for upgrades and ask the user if he wants to download and install them, saving the user the hassle of separately doing those steps. If you choose to implement this, you should allow the user to disable it. There should also still be a way that the user can download an upgrade (as a single file, just as with an initial install), copy it to another machine, and install it there, in case he is running on a machine that is not connected to the network or is behind a firewall that prevents your automated download from working.

There are two ways in which an upgrade is different from an install, leading to two additional goals for the upgrader:
  1. If the user has any configuration or customization, that should be carried over to the new version.
  2. If the user starts running the new version and soon discovers that it is unusable for him, he should quickly be able to roll back to the previous version.
An approach to handle the first goal is to keep the configuration and customization in a separate directory, such as in the user's home directory, or (for Unix systems) in /etc or (for Windows systems) in the Registry. There can still be problems when upgrading if the format of the config and customization files changes, or if the items being configured and customized have changed between versions. Your upgrader should take care of this.

One relatively easy way to satisfy the second goal is to install each version of the application in a separate directory that contains the version number in the name, then providing a current directory that is a link to the version to be used. Rolling back to a previous version might then be as simple as deleting the current link and recreating it to point to the previous version. Ideally, however, this rollback is also done by a program you provide, in case a rollback also requires any other changes such as to the configuration and customization files.

The upgrade and rollback should of course update the list of installed artifacts and current versions.

Patch

Occasionally you might want to deliver a minor update or bug fix to your software. You might send out one modified file and ask the user to install it in a specific location to fix a bug.

While this sounds like an easy mechanism for quick fixes, in the long run you will be better off ensuring that your upgrade process is streamlined enough that you can package up that one file in an upgrade and use your upgrade process.

The problem with sending out patch files and doing ad-hoc installs like this is that it makes it very difficult to keep track of what is installed at a customer site. If you send out four or five patches and then the customer starts reporting unique bugs, will you know what software is running at that site so you can track down those bugs? You could work on setting up a system to keep track of those patches, but you might as well invest that effort into making your upgrade process easier to use.

Perhaps you think that each customer will have a different set of patches, and you don't want to send the same patches to all of your customers, so you don't want to make them all standard upgrades. If it is really the case that you want to deliver different things to different customers, then you are not really delivering one artifact, you are delivering separate artifacts to each customer. In this case, you should just call them different artifacts, give them their own version numbers, and send out upgrades for those separate artifacts. In that way you can continue to use your standard upgrade process, and you can always know exactly what your customer has by collecting the list of artifacts and their version numbers for all of the artifacts installed at a customer site.

If you really think you need to send out patches, consider the following goals:
  • It should be easy for the user to install the patch with a single command.
  • It should be difficult for the user to make a mistake when installing the patch, such as could happen if he has to manually install files into specific directories or manually edit any files.
  • It should be easy for the user to rollback the patch if it doesn't work.
  • It should be possible for both you and the user to know exactly what version of software is installed at the site, including what patches have been applied, even if there is a patch of a patch.
If it seems to you that implementing a patch mechanism that does all of this is easier than adding some improvements to your upgrade process and perhaps dividing up a couple of your artifacts to more accurately reflect how you are actually installing them, then go for it.

Migrate

At some point one of your users might decide that he wants to stop using your software and move to some other package. If you are a commercial software provider you might think this is not something that should be in your list of goals - why should you help out a competitor? - but if you are interested in doing what is best for your user, you should at least recognize this phase of the software lifecycle and make a conscious decision about it. The better you treat a leaving customer, the more likely it is that he will some day be a returning customer.

To support your users in this step, you should provide export tools that allow the user to export all of his data from your application in a standard format. Depending on the application, this might mean exporting a CSV file, an XML file, an Open Document file, or something else.

If you also implement an import capability that reads the same standard file format as your export produces, this could help you in the future if you ever change your internal storage representation from one version to the next: just export from the old version into a file using the standard format, upgrade to the new version, and import that file.

Uninstall

Whether or not a user chooses to move to a different product, he may eventually decide he is done using your software and he would like to remove it from his system. As with the install, it should be possible for the user to uninstall your software with a single command. If an application was installed simply by unpacking it, that single command might be to remove that unpacked directory. With a more complicated installation, uninstallation is likely also to be more complicated, making an uninstaller program more important.

If you have set up your application such that the user-customized portions are separate from the standard install, your uninstaller can give the user the option of keeping those portions. Similarly, if the application maintains user data in its own directories, you should get confirmation from the user before deleting those files and give the user the option of keeping them.

You might also want to consider how you want your installer to behave if the user runs the uninstaller, keeps his customizations and data, then runs the installer. A user might want to do this to downgrade to a previous version if you do not otherwise provide a simple solution for that. Or perhaps you treated a departing customer well enough that he is now returning to your product, in which case he might be pleased to find that his old preferences and customizations are still available.

Wednesday, November 4, 2009

Overriding vals as Optional Parameters

For simple cases you can use Scala vals, selectively overridden, as a way of implementing optional parameters. Overriding can also be used for other interesting tricks.

Contents

Optional Class Parameters

In Java, a typical idiom for initializing an object that has a large number of optional parameters, of which only a few usually get set, is to construct the object and then call setter functions to customize each of the optional parameters. While this technique can be convenient, it leaves open the possibility that the setter might get called later on in the objects lifecycle at a time when changing that value could cause problems.

One solution to this problem is to use the builder pattern. This solution is available in Scala as well, and can be taken a step farther than in Java by using the type-safe builder pattern.

The type-safe builder can be overly complicated for many situations. Sometimes it would be nice to have something simpler than even the simplest of builders.

Scala 2.8 will have named parameters with default values, which will make it pretty easy to create classes that have optional parameters, although you might not want to do this if you have 30 optional parameters. Meanwhile, there is another approach you can use: overriding vals.

The approach is pretty simple: you define a base class with a constructor that includes all of the required parameters, and you then add a val for each of the optional parameters. When you want to create an instance of that class that sets some of the optional parameters, you create an anonymous subclass by adding a set of braces after the new statement that creates the instance, and inside the braces you override each val that you want to set.

In this example we define a Car class that represents a few pieces of information about a car. model and color are required parameters and appear in our constructor. Our optional parameters are hasRadio and hasSunRoof, so we make those vals rather than constructor parameters, and we assign them their default values. We include a toString method so we can easily see the results.

class Car(model:String, color:String) {
    val hasRadio = false
    val hasSunRoof = false

    override def toString() = {
        "Car{"+
            "model="+model+ 
            ",color="+color+
            (if (hasRadio) ",hasRadio" else "")+
            (if (hasSunRoof) ",hasSunRoof" else "")+
        "}"
    }
}
The normal use would be to call the constructor with no additional arguments:
val c1 = new Car("Ford", "red")
println(c1)

//Car{model=Ford,color=red}
To specify one of our optional arguments, we add a code block to the new call, which creates an anonymous subclass in which our val overrides the default:
val c2 = new Car("Chevy", "blue") { 
    override val hasRadio = true 
}   
println(c2)

//Car{model=Chevy,color=blue,hasRadio}
We can pass in values from the caller's context rather than constants:
val myHasSunRoof = true
val c3 = new Car("Honda", "white") {
    override val hasSunRoof = myHasSunRoof
}
println(c3)

//Car{model=Honda,color=white,hasSunRoof}

Optional Trait Parameters

You can use this same approach to pass in values for instance variables in traits, which don't have constructor parameters. For example, say we define a trait for an optional Touring package for our car:
trait Touring {
    val hasNavSystem = false
    val hasExtraSuspension = false
    val hasTowHitch = false
    val hasRunningBoards = false

    override def toString() = {
        super.toString()+
            "+Touring{"+
            (if (hasNavSystem) "navSystem," else "") +
            (if (hasExtraSuspension) "extraSuspension," else "") +
            (if (hasTowHitch) "towHitch," else "") +
            (if (hasRunningBoards) "runningBoards," else "") +
        "}"
    }
}
Now we can create an instance of a Car with Touring and pass in values for some of those "optional constructor parameters" defined in the Touring trait:
val c4 = new Car("Honda","white") with Touring {
    override val hasSunRoof = true      //from Car
    override val hasNavSystem = true    //from Touring
    override val hasRunningBoards = true  //from Touring
}

println(c4)

//Car{model=Honda,color=white,hasSunRoof}+Touring(hasNavSystem,hasRunningBoards,}
NOTE: Due to a bug in older versions of Scala, at least through 2.7.6, overriding a val on a trait as in the above example does not work. This does work properly in Scala 2.8.0 (at least it does in the 20091006 nightly build).

Early Definition

You may have a situation in which some of the vals that you are initializing in a trait or class depend on other vals. In this case, overriding a val as we did above may not give you the result you want: the initializer of the superclass runs to completion before the initializer of the subclass, which means all of the vals in the superclass get set before any of the overriding vals are evaluated.

For example, say we modify our Touring trait by adding a maxTowWeight value, as shown in bold below:
trait Touring {
    val hasNavSystem = false
    val hasExtraSuspension = false
    val hasTowHitch = false
    val hasRunningBoards = false
    val maxTowWeight = if (!hasTowHitch) 0 else
        { if (hasExtraSuspension) 1500 else 1000 }

    override def toString() = {
        super.toString()+
            "+Touring{"+
            (if (hasNavSystem) "navSystem," else "") +
            (if (hasExtraSuspension) "extraSuspension," else "") +
            (if (hasTowHitch) "towHitch," else "") +
            (if (hasRunningBoards) "runningBoards," else "") +
            "maxTowWeight="+maxTowWeight +
        "}"
    }
}
When we instantiate a Car with Touring the constructor code for Touring executes before the constructor code for the new class. In particular, val maxTowWeight gets evaluated before the overriding values are evaluated, so it always ends up with a value of zero:
val c5 = new Car("Honda","white") with Touring { override val hasTowHitch = true }

println(c5)

//Car with Touring = Car{model=Honda,color=white,hasRadio=false,hasSunRoof=false}+Touring{towHitch,maxTowWeight=0}
Scala provides a mechanism to address this issue: Early Definition (Scala Language Specification, section 5.1.6). The vals that you specify in the Early Definition block are evaluated in the context of the calling class, then that set of values is placed into the context of the new class being instantiated such that all of those values are available at the beginning of the process of instantiation, even before the initializer for Object is executed. In this way, any expression which uses one of those vals will have access to the value provided in the Early Definition.

It could be used with our Car example like this:
val c6 = new { override val hasTowHitch = true } with Car("Honda","white") with Touring

println(c6)

//Car with Touring = Car{model=Honda,color=white,hasRadio=false,hasSunRoof=false}+Touring{towHitch,maxTowWeight=1000}
A class definition for the above example could look like this:
class TouringCarWithHitch(name:String, color:String) extends {
            override val hasTowHitch = true
        } with Car(name,color) with Touring {
    //normal class overrides and additional elements here
}

val c7 = new TouringCarWithHitch("Honda","white")
//c7 is the same as c6 (but we have not implemented ==)

Required Trait Parameters

If you want to define a trait that has required parameters rather than optional parameters, you can omit the value from the declarations and instead specify only the type, which causes the val to be abstract. For example, if we want to make the hasTowHitch and hasNavSystem parameters to our modified Touring trait be required, that would look like this:
trait Touring {
    val hasNavSystem:Boolean   //abstract (no value)
    val hasExtraSuspension = false
    val hasTowHitch:Boolean    //abstract (no value)
    val hasRunningBoards = false
    val maxTowWeight = if (!hasTowHitch) 0 else
        { if (hasExtraSuspension) 1500 else 1000 }

    override def toString() = {
        super.toString()+
            "+Touring{"+
            (if (hasNavSystem) "navSystem," else "") +
            (if (hasExtraSuspension) "extraSuspension," else "") +
            (if (hasTowHitch) "towHitch," else "") +
            (if (hasRunningBoards) "runningBoards," else "") +
            "maxTowWeight="+maxTowWeight +
        "}"
    }
}
Now when we declare a concrete instance of this class, we are required to define values for those two variables else we will get a compiler error. Since the base declaration is now abstract, we omit the override keyword on those vals:
val c8 = new Car("Honda","white") with Touring {
    override val hasSunRoof = true      //from Car
    val hasNavSystem = true             //from Touring; required
    override val hasRunningBoards = true  //from Touring; optional
    val hasTowHitch = false             //from Touring; required
}

println(c8)

//Car{model=Honda,color=white,hasSunRoof}+Touring(hasNavSystem,hasRunningBoards,maxTowWeight=0}

Abstract Class Parameters

Sometimes it is convenient to use an abstract val rather than a constructor parameter for abstract classes. For example, say you have a Service and you want to define a set of case classes for service messages. The base class should have a reference to the Service object so that it can easily be processed by generic service methods, but each case class should also have the same reference as a case value for easy matching. For consistency, since these are the same value, the name should be the same. You could do this by defining the base class with one parameter declared as a val to make it accessible, then define the case classes to override that value, like this:
abstract class Service
abstract class ServiceMessage(val service:Service)
case class ServiceStart(override service:Service) extends ServiceMessage(service)
case class ServiceStop(override service:Service) extends ServiceMessage(service)
The case class automatically adds a val keyword to each of our parameters, so we need to specify the override keyword, but can omit the val keyword.

We can simplify our case classes a bit by changing the base class val from a constructor parameter to an abstract val, like this:
abstract class Service
abstract class ServiceMessage { val service:Service }
case class ServiceStart(service:Service) extends ServiceMessage
case class ServiceStop(service:Service) extends ServiceMessage
Not only have we dropped the override keyword, but we are also not passing the service parameter to the superclass. The implied val keyword on the case class parameters creates a concrete instance of the service parameter that overrides the abstract value defined in the base class.

Type Parameters

Just as scala has value parameters, concrete value members and abstract value members, it likewise has type parameters, concrete type members and abstract type members. The approach used above on values can generally by applied to types as well: rather than defining a class with a type parameter, you can often define that class with a type member. If the type is a required type that must be overridden by the extending class, make the type member abstract; if you want the subclass to be able to default to the type used in the superclass, use a concrete type and let the subclass use the override keyword if it wants to override that type.

Bill Venners has a nice blog post where he discusses the question of when to use a type parameter and when to use an abstract type member, with a reference to an interview with Martin Odersky where he talks about abstract type members in comparison to instance variables.

Caveats

Although in many ways you are free to choose between using a constructor parameter versus a class member, they are not entirely equivalent. In particular, once you start building up class hierarchies using abstract and concrete members with overrides, you have to be careful that the initialization order is what you expect. In the Early Definition section above I gave one example of how values can fail to initialize correctly due to ordering issues. That one is pretty easy to understand, but they can sometimes be far more subtle and hard to spot.

One thing you can do that will sometimes fix such problems is to use the lazy keyword on your value members in order to get lazy initialization. This causes initialization of the value to be delayed until the first time it is used, rather than being eagerly initialized when the class is initialized. Note that if you declare a concrete variable as lazy, then an overriding instance of that variable must also be declared as lazy; if the original concrete variable is not lazy, the overriding variable can not be lazy.

Note that overriding a val in Scala is not the same as declaring a variable of the same name in a subclass in Java. Consider this Java test program Test.java:
public class Test {
    public static void main(String[] args) {
        (new Test1()).test1();
        (new Test2()).test1();
        (new Test2()).test2();
    }
}

class Test1 {
    public int t = 1;

    public void test1() {
        System.out.println("t="+t);
    }
    public void test2() {
        System.out.println("t="+t);
    }
}

class Test2 extends Test1 {
    public int t = 2;

    public void test2() {
        System.out.println("t="+t);
    }
}
and the apparently equivalent Scala test program Test.scala (where I have used Java-like syntax where possible so that you can run "diff" on the two files):
object Test {
    def main(args: Array[String]) {
        (new Test1()).test1();
        (new Test2()).test1();
        (new Test2()).test2();
    }
}

class Test1 {
    val t = 1

    def test1() {
        System.out.println("t="+t);
    }
    def test2() {
        System.out.println("t="+t);
    }
}

class Test2 extends Test1 {
    override val t = 2

    override def test2() {
        System.out.println("t="+t);
    }
}
Copy these out to Test.java and Test.scala, then compile and run each one (don't try to compile both and then run both in the same directory, as the class files will collide). The Java test prints this out:
t=1
t=1
t=2
The Scala test prints this out:
t=1
t=2
t=2
Note the difference in the middle line, where we have called Test2.test1(). The Java program prints 1, but the Scala program prints 2. This is because the declaration of t in Test2 in Java does not override the value in Test1, it shadows it. The Test1 value of t is still there, and it used by any method in Test1 that refers to that variable.

In Scala, by contrast, references to t in Test1 refer to the overridden value provided by Test2. Scala can do this because, consistent with the Uniform Access Principle, a variable in Scala is accessed by a pair of functions to get and set its value. When a value is overridden, that creates new access functions in the subclass that override the access functions in the base class.

Sunday, October 11, 2009

Scala Case Statements As Partial Functions

A Scala case statement can be either a Function1 or a PartialFunction depending on the context.

In my previous post I presented a simple Publisher that I used to decouple my Swing actors from their targets. Reader nairb774 pointed out that the standard Scala library includes a Publisher class. In fact, there are two Publisher classes in Scala, scala.collection.mutable.Publisher and scala.swing.Publisher. Although I like my publisher class better, the swing publisher did have one feature that I thought was useful: it accepted as a callback a PartialFunction rather than, as mine did, a Function1. That would mean, I thought, that I could pass in a case statement as a callback.

For example, continuing the Mimprint example from my previous post, if I were only interested in Enabled events published by a particular publisher, rather than explicitly checking this in my callback with an isInstanceOf or a match statement that includes a case _ => clause, I could just use a one-line case statement:
showSingleViewerPublisher.subscribe { case e:Enabled => doSomething() }
My calling code in Publisher would call apply on the PartialFunction callback only if a call to its isDefinedAt method returned true, thus avoiding the MatchError that would occur if I treated it like a Function1 and called its apply method when the value was not Enabled. This seemed like useful functionality, so I decided to add it. I thought it would be easy, but unfortunately it was not.

Consider the following three definitions that assign a case statement to a partial function, full function, or no explicit function type, respectively:
val pfv:PartialFunction[String,Unit] = { case "x" => println("Got x") } val ffv:Function1[String,Unit] = { case "x" => println("Got x") } val nfv = { case "x" => println("Got x") }
For the first line, the variable pfv gets assigned a value which is a PartialFunction representing the case statement. For the second line, you might think that, since PartialFunction extends Function1 and we are assigning the same value to ffv as we did to pfv, that the variable ffv would be assigned a value which is a PartialFunction, just as for the value pfv. This is not the case.

The Scala Language Specification (SLS) explicitly states, in section 8.5, that the type of an anonymous function comprised of one or more case statements must be specified as either a FunctionK or a PartialFunction, and that the value generated by the compiler is different depending on that specified target type. So the value that gets assigned to ffv is a Function1, and ffv.isInstanceOf[PartialFunction[_,_]] evaluates to false. Note that we could assign the value pfv to the variable ffv, in which case ffv would have a value which is a PartialFunction and ffv.isInstanceOf[PartialFunction[_,_]] would evaluate to true.

What happens if you don't specify the type, as in the third line above where we assign the same value to nfv? You might think the compiler could infer the type of the resulting value, but since, as specified in the SLS, the type must be explicitly specified as either a FunctionK or a PartialFunction, our assignment to nfv is actually not a valid statement, and it fails to compile. It would be nice if the error message said something like "You must explicitly specify either a FunctionK or a PartialFunction for a case statement", but instead it gives this relatively unhelpful message:
<console>:4: error: missing parameter type for expanded function ((x0$1) => x0$1 match { case "x" => println("Got x") }) val nfv = { case "x" => println("Got x") } ^
In my case, the situation in which I encountered this message was a little different. Here is an example showing the problem I ran into:
class PF[T] { //partial function type def sub(x:PartialFunction[T,Unit]) = x } class FF[T] { //full function type def sub(x:Function[T,Unit]) = x } class NF[T] { //no unique function type def sub(x:PartialFunction[T,Unit]) = x def sub(x:Function[T,Unit]) = x } val pf = new PF[String] val ff = new FF[String] val nf = new NF[String] pf.sub{ case "x" => println("x") } //works, result is PartialFunction ff.sub{ case "x" => println("x") } //works, result is Function1 nf.sub{ case "x" => println("x") } //fails with compiler error msg
Calling the above method sub with a case statement works when there is only one method of that name, whether it takes a Function1 or a PartialFunction, but although the compiler has no problem compiling the overloaded pair of functions, once they both exist the compiler can no longer unambiguously determine the target type for the case statement, so it delivers that same error message "missing parameter type for expanded function".

In my case I was trying to modify the subscribe method in my Publisher class so that I could pass in either a regular function, such as println(_), or a PartialFunction, in particular an in-line case statement. The three options I tried are essentially classes PF, FF and NF listed above. When I used approach NF I was unable to directly pass in a case statement, but instead would get the compiler error mentioned above. When I used approach PF I could pass in a case statement as a PartialFunction, but I could not pass in a regular function. When I used approach FF I could pass in a regular function, and could pass in and properly deal with a PartialFunction, since it extends Function1, but when I used an in-line case statement it would get compiled as a Function1 rather than a PartialFunction, which would cause execution to fail when a value was passed to that case statement that it did not cover (since it was not a PartialFunction and thus did not have an isDefinedAt method to call first).

I don't like option FF because it would allow code (specifically, an in-line case statement) to compile but then not execute as expected. Options PF and NF are not very useful as is, since neither directly supports both case statements and full functions.

In a mailing list response to someone who was attempting to use option NF in his application, Paul Phillips suggested using option FF with a helper function pf that accepts a PartialFunction and returns the same value, then wrapping any case statements inside a call to that helper function; or, alternatively, assigning the case statement to a val declared as a PartialFunction before passing it to method sub. Unfortunately, if the user forgets to use either of these techniques on a case statement and just passes it directly to method sub in option FF, it will be handled as a Function1 rather than a PartialFunction, so it will compile but not behave as expected.

Paul's suggestion would also work in option NF (and in option PF, although in that case it would be redundant), which would behave much the same as option FF from the user's perspective except that passing a bare case statement to the overloaded method sub would not compile, so we would no longer have the undesirable situation of something that compiles but behaves unexpectedly.

As an alternative to Paul's pf helper function, I could write a helper function ff that takes a Function1 and turns it into a PartialFunction with an isDefinedAt method that always returns true. I would then use this with option PF. This would allow me to directly pass in case statements, but I would have to wrap all regular functions in a call to ff.

I have not yet made any changes to my Publisher class, since I don't particularly like either of the options and I don't currently really need the ability to use in-line case statements. Meanwhile, if I get the compiler error "missing parameter type for expanded function" while trying to use an in-line case statement, at least I now know one more thing to check for.

Wednesday, October 7, 2009

A Simple Publish/Subscribe Example in Scala

Here is an example where using a simple publish/subscribe mechanism allowed me to clean up some of my early Scala code.

My Mimprint program (now also on github) was originally written in Java, then ported to Scala soon after I first started learning that language. As such, much of that original ported code was "Java written in Scala". As I have continued to internalize the Scala approach I have gone back and modified various parts of the program to make it cleaner.

In one part of the program I set up a collection of menu checkboxes to allow the user to enable or disable various features. As those features are enabled or disabled, the states of other screen components change; sometimes a component is enabled or disabled, sometimes a component is hidden or made visible.

My original Java-ish Scala code to do this looked something like this (with irrelevant parts omitted):
class ViewListGroup ... { ... private var singleComp:Component = _ private var mShowFileInfo:SCheckBoxMenuItem = _ private var mShowFileIcons:SCheckBoxMenuItem = _ private var mShowDirDates:SCheckBoxMenuItem = _ private var mShowSingleViewer:SCheckBoxMenuItem = _ def getComponent():Component = { ... singleComp = playViewSingle.getComponent() ... //Add our menu items mShowFileInfo = new SCheckBoxMenuItem( viewer,"menu.List.ShowFileInfo")( showFileInfo(mShowFileInfo.getState)) mShowFileInfo.setState(true) m.add(mShowFileInfo) mShowFileIcons = new SCheckBoxMenuItem( viewer,"menu.List.ShowFileIcons")( showFileIcons(mShowFileIcons.getState)) mShowFileIcons.setState(false) m.add(mShowFileIcons) mShowDirDates = new SCheckBoxMenuItem( viewer,"menu.List.ShowDirDates")( showDirDates(mShowDirDates.getState)) mShowDirDates.setState(playViewList.includeDirectoryDates) ... m.add(mShowDirDates) mShowSingleViewer = new SCheckBoxMenuItem( viewer,"menu.List.ShowSingleViewer")( showSingleViewer(mShowSingleViewer.getState)) mShowSingleViewer.setState(true) m.add(mShowSingleViewer) showSingleViewer(mShowSingleViewer.getState) //make sure window state is in sync with menu item state ... } ... def showFileInfo(b:Boolean) { playViewList.showFileInfo(b) mShowFileInfo.setState(b) mShowFileIcons.setEnabled(b) mShowDirDates.setEnabled(b) } def showFileIcons(b:Boolean) { playViewList.showFileIcons(b) playViewList.redisplayList() } def showDirDates(b:Boolean) { playViewList.includeDirectoryDates = b playViewList.redisplayList() } def showSingleViewer(b:Boolean) { singleComp.setVisible(b) singleComp.getParent.asInstanceOf[JSplitPane].resetToPreferredSizes() mShowSingleViewer.setState(b) playViewList.requestSelect } ... }
There were two things about this code that I didn't like:
  1. Mutable instance variables using var, particularly since they were not really variable. These values were being assigned once, not at construction time, but had to be available to other methods.
  2. The close binding between the different UI components, since the action method called by one component directly modified attributes of possibly a number of other components.
After a recent conversation with a friend I realized that I could probably improve this code by using a publish/subscribe mechanism to loosen the coupling between the components. Mimprint already had an ActorPublisher class, where each subscriber is an Actor that accepts messages of the published object type, but in this case I wanted a lighter weight implementation, since I knew the subscriber actions would be quick. Also, this being Swing, the subscriber actions that update screen state should run in the Swing event thread, and the events being published are also coming from the event thread, so the simple thing to do is to run the subscriber actions directly from the publish method.

Writing a publish/subscribe handler in Scala is pretty easy, and for me it was even simpler, as I already had one. I grabbed my ListenerManager and modified it to use the publish/subscribe terminology. I also added synchronization to make it multi-thread safe, although for this app I don't really need it. It now looks like this:
package net.jimmc.util /** Manage a subscriber list. * There are no guarantees on the order of subscribers in the list. * This code is a slightly modified version of ListenerManager * as published to my blog in April 2009. */ trait Publisher[E] { type S = (E) => Unit private var subscribers: List[S] = Nil private object lock //By using lock.synchronized rather than this.synchronized we reduce //the scope of our lock from the extending object (which might be //mixing us in with other classes) to just this trait. /** True if the subscriber is already in our list. */ def isSubscribed(subscriber:S) = { val subs = lock.synchronized { subscribers } subs.exists(_==subscriber) } /** Add a subscriber to our list if it is not already there. */ def subscribe(subscriber:S) = lock.synchronized { if (!isSubscribed(subscriber)) subscribers = subscriber :: subscribers } /** Remove a subscriber from our list. If not in the list, ignored. */ def unsubscribe(subscriber:S):Unit = lock.synchronized { subscribers = subscribers.filter(_!=subscriber) } /** Publish an event to all subscribers on the list. */ def publish(event:E) = { val subs = lock.synchronized { subscribers } subs.foreach(_.apply(event)) } }
For each menu checkbox I would like to set up a publisher. In every case, I just need to publish whether that checkbox has just been enabled or disabled. I defined a simple case class hierarchy to represent the Enabled and Disabled messages:
sealed abstract class Abled case object Enabled extends Abled case object Disabled extends Abled
I then created a publisher class that uses that event type:
class AbledPublisher extends Publisher[Abled]
I want to easily publish the Enabled or Disabled object based on the current state of a checkbox, so I added an AbledPublisher companion object with an apply method to do that:
object AbledPublisher { object Abled { def apply(b:Boolean) = if (b) Enabled else Disabled } }
Conversely, upon receiving an Abled event in a subscriber for a UI component I want to be able to enable or disable that component. I could use a match statement with cases for Enabled and Disabled, but a simpler way is to modify the Abled case class hierarchy to encode a boolean state value into the Abled case object to allow easy translation from an Abled object back to a state:
sealed abstract class Abled { val state:Boolean } case object Enabled extends Abled { override val state = true } case object Disabled extends Abled { override val state = false }
Finally, I packaged up the case class hierarchy inside the AbledPublisher object to control scoping. The final AbledPublisher file looks like this:
package net.jimmc.util //For subscribers of things that turn on and off class AbledPublisher extends Publisher[AbledPublisher.Abled] // use "import AbledPublisher._" to pick up these definitions object AbledPublisher { sealed abstract class Abled { val state:Boolean } case object Enabled extends Abled { override val state = true } case object Disabled extends Abled { override val state = false } object Abled { def apply(b:Boolean) = if (b) Enabled else Disabled } }
Given the above AbledPublisher class and object, I modified my code so that the action method called by each menu checkbox publishes an Enabled or Disabled event that matches the new state of the checkbox, and for each place in the old code where an action method called a state-changing method on another component I set up that target component as a subscriber to the appropriate publisher that, when it receives a published event, takes appropriate action on itself.

With the above changes, and a slight change to my SCheckBoxMenuItem class so that it passes itself to the action callback, the code now looks like this:
import net.jimmc.util.AbledPublisher import net.jimmc.util.AbledPublisher._ class ViewListGroup ... { vlg:ViewListGroup => ... private val showFileInfoPublisher = new AbledPublisher private val showSingleViewerPublisher = new AbledPublisher private val showDirectoriesPublisher = new AbledPublisher ... def getComponent():Component = { ... val singleComp = playViewSingle.getComponent() showSingleViewerPublisher.subscribe((ev)=> { singleComp.setVisible(ev.state) singleComp.getParent.asInstanceOf[JSplitPane].resetToPreferredSizes() }) ... //Add our menu items val mShowFileInfo = new SCheckBoxMenuItem( viewer,"menu.List.ShowFileInfo")((cb)=> showFileInfo(cb.getState)) mShowFileInfo.setState(true) showFileInfoPublisher.subscribe((ev)=> mShowFileInfo.setState(ev.state) ) m.add(mShowFileInfo) val mShowFileIcons = new SCheckBoxMenuItem( viewer,"menu.List.ShowFileIcons")((cb)=> showFileIcons(cb.getState)) mShowFileIcons.setState(false) showFileInfoPublisher.subscribe((ev)=> mShowFileIcons.setState(ev.state) ) m.add(mShowFileIcons) val mShowDirDates = new SCheckBoxMenuItem( viewer,"menu.List.ShowDirDates")((cb)=> showDirDates(cb.getState)) mShowDirDates.setState(playViewList.includeDirectoryDates) mShowDirDates.setVisible(includeDirectories) showFileInfoPublisher.subscribe((ev)=> mShowDirDates.setState(ev.state) ) showDirectoriesPublisher.subscribe((ev)=> mShowDirDates.setVisible(ev.state) ) m.add(mShowDirDates) val mShowSingleViewer:SCheckBoxMenuItem = new SCheckBoxMenuItem( viewer,"menu.List.ShowSingleViewer")((cb)=> showSingleViewer(cb.getState)) mShowSingleViewer.setState(true) showSingleViewerPublisher.subscribe((ev)=> mShowSingleViewer.setState(ev.state) ) m.add(mShowSingleViewer) showSingleViewer(mShowSingleViewer.getState) //make sure window state is in sync with menu item state ... } ... def showFileInfo(b:Boolean) { playViewList.showFileInfo(b) showFileInfoPublisher.publish(Abled(b)) } def showFileIcons(b:Boolean) { playViewList.showFileIcons(b) playViewList.redisplayList() } def showDirDates(b:Boolean) { playViewList.includeDirectoryDates = b playViewList.redisplayList() } def showSingleViewer(b:Boolean) { showSingleViewerPublisher.publish(Abled(b)) playViewList.requestSelect } ... }
The total number of lines of code in ViewListGroup is actually a bit more than before, but I find the code a little easier to understand because all of the code that acts on a UI component is now localized in one place in the source file. All of the vars that held pointers to those components are now gone, replaced by a few vals for the publishers. The publishers use vars to maintain internal state, but that state is simple and easily understood, well encapsulated and multi-thread safe.

There is still more cleanup work to be done in Mimprint. For example, in the above code the checkbox action methods such as showFileInfo and showFileIcons call methods on the playViewList object as well as publishing an Abled event. Instead, I could set up playViewList as a listener on each of the published events, then make the menu checkbox actions directly publish an event and get rid of the showXXX methods. I will leave that for another round of cleanup.

Thursday, October 1, 2009

Initializing Immutable Variables in Scala

One of the guidelines I picked up when I learned Scala is to use immutable variables as much as possible. Besides the trivial but satisfying detail of making the declaration of an immutable variable (val) take no more characters than a mutable one (var), Scala also provides some interesting ways to set the values into those immutable variables.

In Scala, immutable variables are identified by declaring them using the val keyword rather than var. In Java, immutable variables are identified by adding the final qualifier to the variable declaration. But a Java final variable has slightly different semantics than a Scala val: in Java, you can declare a final variable without specifying a value for it, then fill in the value later. Java allows the variable to be assigned once, after which it can not be assigned again. In Scala, a concrete val must have its value assigned as part of the definition.

Consider this sample Java class, Interval, which represents an interval on the real number line. We want to allow the constructor to be called with endpoints in either order, but we want to store them internally in sorted order.
//Java code public class Interval { final double start; final double end; //invariant: end>=start public Interval(double x1, double x2) { if (x1>x2) { start = x2; end = x1; } else { start = x1; end = x2; } } //other methods that use start and end go here }
If you try this idiom in Scala, by replacing each final variable with a val but continuing to use the same initialization construct, you will get a compiler error "reassignment to val". When using a concrete val in Scala, you must supply the value in the statement where you declare the val.

For relatively simple cases, as in this example, we can take advantage of the fact that Scala allows us to build expressions with if in them, so we can express the same functionality as in the above Java code as follows:
class Interval(x1:Double, x2:Double) { val start = if (x2>x1) x1 else x2 val end = if (x2>x1) x2 else x1 //other methods that use start and end go here }
Sometimes the logic to calculate the values for the immutable variables is much more complicated than this and more expensive to calculate. Perhaps, as in our Java example, we don't want to recalculate that condition over again for each variable. We might also be more comfortable building up our values using mutable variables. We could take the easy and straightforward way and just use var rather than val for our variables, but it is worth a bit of effort to retain the immutability of our variables. Here is an approach I sometimes take:
class Interval(x1:Double, x2:Double) { val (start, end) = { def intervalNeedsReversing(a:Double,b:Double) = (a>b) if (intervalNeedsReversing(x1,x2)) (x2, x1) else (x1, x2) } //other methods that use start and end go here }
In the above approach, we have a block of code that calculates our values. Though not needed in this case, the intervalNeedsReversing function is an example of how you you can define functions within a block in order to refactor that code or better organize it. The value of the block is a tuple, which we then assign using a tuple-assignment to our immutable variables start and end.

A tuple-assignment is a pattern-matching operation that pulls apart the tuple data and stores each piece into the separate variables. It looks like the second line in this example:
val t2 = (123, "abc") //the type of t2 is Tuple2[Int, java.lang.String] val (n, s) = t2 //assigns n=123, s="abc"
You can use any expression in place of t2 that has the same type, including a function call, a variable, a literal tuple, or a code block.

You can include a type on each variable name; if the types of the assigned variables don't match the corresponding types of the value on the right hand side, you will get a compiler error.
val (n:Int, s:String) = t2 //ok val (s:String, n:Int) = t2 //error
The tuple syntax of parentheses around a comma-separated list of values is actually a shorthand for the TupleN class. For each pair of lines below, the first line is a shorthand ("syntactic sugar") for the second.
(a, b) Tuple2(a, b) (1, "x", "y") Tuple3(1, "x", "y") val (n, s) = t2 val Tuple2(n, s) = t2
The last of the three examples above is a pattern-matching assignment statement.

You can use the List pattern in an assignment as well:
val a :: b :: c = List(1,2,3,4) //This assigns a:Int=1, b:Int=2, c:List[Int]=List(3,4)
The List and Tuple classes can be used in a pattern-matching assignment like this because they each have an extractor defined by the unapply method in their companion object. You can use any extractor (that is, any declared object that includes an unapply method) in this way. For example, a case class can be used:
case class Foo(num:Int, str:String) val f = Foo(42,"ok") val Foo(n,s) = f //assigns n:Int=42, s:String="ok"
This works even if the case class happens to use mutable fields: the values at the time of the pattern match assignment are set into the new variables, which are immutable.
case class Bar(var num:Int, var str:String) val b = Bar(42,"ok") b.num += 1 b.str = "no" val Bar(n,s) = b //assigns n:Int=43, s:String="no" b.num += 1 //does not change n
For example, if you have a large number of values to set at once, you could declare a case class to represent them, and match on that to assign the values:
class AnotherExample { case class MyArgs(var name:String, var pathPart:String, var someNumber:Int) val MyArgs(path, part, num) = { val m = MyArgs("/path/foo/bar", "partX", 123) //change values of fields in m as desired m } }
You thus get the benefit of having immutable variables for use in your constructed object, but you can use mutable private data within the block to make it easier to do your construction.

You can use this technique to initialize immutable variables within a method as well. Effectively, you are using mutable variables only for the limited scope in which they are desired. By enclosing them in a block you prevent code outside that block from modifying those mutable values.

Since this technique is based on pattern matching, you can use it with any legal pattern. Pattern matching is typically used in the case clauses of match statements.

Patterns can include nested constructs, which allows you to pull out values from deep within a structure when that structure is known. By using the @ operator within a pattern you can extract the value of an entire subpattern:
case class Foo(n:Int, var s:String) case class Baz(f:Foo, b:Option[Baz]) val data = 123 :: Baz(Foo(3,"c"),Some(Baz(Foo(4,"d"),None))) :: 456 :: Nil val _ :: Baz(Foo(_,a),Some(b @ Baz(c @ Foo(d,e),_))) :: f :: _ = data // The above val statement assigns these values: // a = "c" // b = Baz(Foo(4,"d"),None) // c = Foo(4,"d") // d = 4 // e = "d" // f = 456
The undersccore indicates a placeholder for a part of the pattern whose value we don't care about and don't want assigned to anything.

Note that the variable c refers to the same object as the Foo object that appears in variable b. We defined Foo with a var for s. If we change the value of the Foo object referenced by variable c, then we will see that change when we ask for the value of variable b:
scala> b res0: Baz = Baz(Foo(4,d),None) scala> c res1: Foo = Foo(4,d) scala> c.s = "x" scala> c res2: Foo = Foo(4,x) scala> b res3: Baz = Baz(Foo(4,x),None)
Although b and c are themselves immutable variables, if they point to the same mutable object then changes made to that object through one variable will be visible through the other variable.

As you learn Scala and see examples of case statements, remember that any syntax that is valid as the pattern match in a case statement is also valid as a pattern match in a val assignment.