Sunday, December 28, 2008

Understanding Monads

Monads are not trivial. If they were, there would not be so many tutorials and articles explaining them.

On the other hand, as with many concepts that we may learn with some difficulty and later think simple, they are not that complicated, if you understand a few other things first. As an analogy, it is hard to understand and use multiplication if you have not yet mastered the concept of addition, but most of us now are pretty comfortable with multiplication.

This is not intended to be another monad tutorial. As with learning addition before multiplication, I believe there are some essential concepts that are necessary before one can be comfortable with monads. In this post I list the concepts I found useful on the path to my understanding of monads (such as it currently is). I set out these details to help me solidify that understanding. Other people may be able to understand monads without understanding all of these concepts, or may require learning other concepts in addition to some of these, but perhaps there will be a few people who find these notes useful.

I came across monads when investigating Haskell after programming in C and Java for many years. The path I took reflects the fact that I have been used to thinking in terms of impure imperative languages. Monads are a more natural fit to functional languages, and I believe part of the task in understanding them is to think more in the functional style.

Below is my list of concepts I found helpful. Don't worry about trying to see how they all relate to monads, just read them. Later, when you read one of those monad tutorials, perhaps it will make more sense.


The Option Class in Scala

In Java, null is often used as a marker to indicate a missing value for an object. For example, System.getProperty returns null if the property is not defined. The Option class in Scala is a nice way to avoid using nulls this way, which makes for cleaner code. You can read about Scala's Option class in my post about avoiding nulls, and you should use the Option class a bit to get to know it.

Option is a monad, so once you know how to use it, you know how to use at least one monad. See how easy it is to start using monads?

Parser Combinators

Scala provides parser combinators as part of the standard library. This makes it easy to build simple parsers. You can see how this is done by reading my post about parser generators where I build a simple four-function expression parser.

When using parser combinators, parsers are built up by composing (combining) other parsers using various combining operators (combinators). The result of one of these combining operators is another parser. Eventually, you build up a top-level parser that can parse your entire grammar. Then you invoke that parser on your input stream to parse the input.

These parser combinators are monads. See "Monadic Parser Combinators".

Uniform Container Methods in Scala

The Option class, mentioned above, defines methods for map, flatMap and filter. All of the container classes in Scala, including Array, List and the Map trait, define these same three methods. The container classes also implement foreach, forall and exists. Option implements foreach, even though it never has more than one item. This makes more sense if you think of Option like a List that can have only zero or one element in it.

Having this same set of methods available for all of the collection classes makes it easier for the programmer to use any of them, but it also makes it possible to write a package of higher-level control code that can accept any of these classes. With such a higher-level package, you can add your own container class that will work with that package as long as your container class supplies the appropriate set of methods.

The container classes in Scala (including Option) are all monads. Because all monad classes supply a specific set of methods (informally, the monad interface), a monad library of higher-level control classes can be written that use that monad interface. In addition to supplying specific methods, there are a couple of consistency rules that constrain how those monad methods must behave. This is no different than, for example, the contract in Java relating the behavior of the equals and the hashCode methods for a class. As with any other interface definition and contract, if you write your own class and it implements the monad methods and follows the monad contract, then you can use that class with a monad library.

Functional Languages

Functional languages stress some things that are not typically discussed in imperative languages such as Java:
  • Referential transparency
  • Immutable data
  • Recursion rather than loops
  • Lazy evaluation
These concepts fit together nicely in a functional language. Using immutable data goes hand-in-hand with referential transparency. Lazy evaluation is an efficient way (and is sometimes necessary to make it even possible) to implement delayed computations.

Referential Transparency

A pure functional language, such as Haskell, has no side effects. This means that calling a function returns a value and does nothing else. In particular, a function can not set state. This is a very different way of thinking from imperative languages such as Java, where there are objects all over the place with state that gets changed by method calls.

Referential transparency is a phrase from the functional programming world which means, basically, "no side effects". Side effects include reading any state which is not passed in as an argument or setting any state which is not part of what is passed back as an argument. If a function is referentially transparent, then a call to that function with a specific set of values as arguments will always return exactly the same value.

When functions are composed into larger functions, it is easier to reason about them when they are referentially transparent and there are no side effects to worry about. Of course, in the real world it can be very useful to be able to store some state data, so how does one implement this functionality in a pure functional language?

The answer is Monads. This is described pretty well in a 1992 paper by Philip Wadler called "Monads for functional programming". Reading this paper helped me understand the motivation for monads. Monads provide a way to collect all those side effects into known locations in the program. When all of the side effects are collected into monads, the rest of the program (all of the non-monad parts) are still referentially transparent, so remain easier to compose. The side effects are still there, but because they are encapsulated by the monads, it is easier to deal with them.

Group Theory

When you read about monads, at some point you will come across a mention of Category Theory as the source of monads. Category Theory is pretty abstract, and you don't need to know it to understand monads. But monads are a mathematical concept, so digging into the math background of the concept could help solidify your understanding.

After reading a bunch of stuff about Category Theory, I went back to Group Theory, which for our purposes you can think of as a special case of Category Theory. While reviewing groups, rings and fields, I stumbled across monoids, a term which I had not recalled from my readings in the field years ago. It's not quite the same thing as a monad, but because Group Theory is focused on transformations, I started thinking of monads as transformations, which I think helped my understanding of them.

If you know your basic math, like addition and multiplication, you know something about Group Theory. Here's a little sampler:
  • The integers, when taken with the operation of addition, are a group. This is because addition of integers satisfies the following rules:
    1. There is a unique identity value (zero).
    2. The set (of integers) is closed under the operation (of addition).
    3. There is an inverse (the negative) for every value.
    4. The operation (addition) is associative.
  • The integers, when taken with the operations of addition and multiplication, are a ring. This is because they satisfy all of the above group rules (for addition), plus the following rules:
    1. There is a unique identity value (1) for the second operation (multiplication).
    2. The set (of integers) is closed under the second operation (multiplication).
    3. The second operation (multiplication) is associative.
    4. The first operation (addition) distributes over the second operation (multiplication).
  • The rational numbers, when taken with the operators of addition and multiplication, are a field. This is because they satisfy all of the above ring rules, plus the following:
    1. There is an inverse (the reciprocal) under the second operation (multiplication) for every value (except zero).
A monoid, in case you are interested, is like a group but without the requirement of having an inverse. Thus, the natural numbers with the addition operation are a monoid.

An item of interest related to monads: the value zero is a special value under multiplication in the field of integers, because any value multiplied by zero is still zero. This is similar to the None value in an Option in that the standard operations such as map and filter have no effect on None. None is thus a "zero" value for an Option. In the same way, an empty List is a "zero" value for a List.

Monads as Delayed Computations

Back in 2007, when I had just started looking at Haskell and had first run across monads, I had the good fortune to be sitting next to Conal Elliott on a long plane flight. He was on his way to deliver a paper at a Haskell conference, but at the time I had no idea who he was. Since he obviously knew a lot about Haskell, I asked him to explain monads to me, which he cheerfully attempted. Given how little I knew at the time about all of the pieces I list in this post, he did a very good job - but I still struggled with it.

My recollection of my interpretation of his explanation is this: In a pure functional program, there are no side effects, so the program must do the same thing every time. Clearly this does not work when there is user input, because you want the program to do different things based on different user input each time it is executed. So think of the program execution as being in two phases: first, create an execution tree that represents the program with all possible choices based on user input; then evaluate that execution tree, using the user input when required to prune off parts of the tree and select other parts.

Of course the actual tree with all possible choices might be infinitely large, and certainly will be too large to fully instantiate, so you don't really separate the above two phases and try to build the whole tree at once. Instead, you build it only as you need it (lazy evaluation). So you build a bit of the tree, then you evaluate that bit, which prunes out a bunch of potential branches and directs you to the next part of the tree that needs to be built. Repeat until you have completed the evaluation, in which case the program terminates.

In the above model, the decision points that are based on user input are the monads. I visualized the monad as a node in the tree with a bunch of children together with a decision function based on user input that determined which of those children to evaluate.


My conclusion: don't worry about understanding monads, just keep reading about them and learning more about functional programming. Some day it will all make sense and seem easy.

Update 2009-02-18: Change typo "with" to "without" as pointed out by Joseph.


Joseph Wofford said...

"A monoid, in case you are interested, is like a group but with the requirement of having an inverse."

I believe you meant "without the requirement of having an inverse."

Jim McBeath said...

Joseph: Thanks. Fixed.