Contents
Introduction
In a recent comment on my 2008 blog post about Scala's parser combinators, a reader asked how one might go about debugging such a parser. As one post says, "Debugging a parser implemented with the help of a combinator library has its special challenges." You may have trouble setting breakpoints, and stack traces can be difficult to interpret.The two techniques I show here may not provide you with the kind of visibility you might be used to when single-stepping through problem code, but I hope they provide at least a little more visibility than you might otherwise have.
Example Parser
As an example parser I will use an integer-only version of the four-function arithmetic parser I built for my 2008 parser combinator post. The code consists of a set of case classes to represent the parsed results and a parser class that contains the parsing rules and a few helper methods. You can copy this code into a file and either compile it or load it into the Scala REPL.import scala.util.parsing.combinator.syntactical.StandardTokenParsers sealed abstract class Expr { def eval():Int } case class EConst(value:Int) extends Expr { def eval():Int = value } case class EAdd(left:Expr, right:Expr) extends Expr { def eval():Int = left.eval + right.eval } case class ESub(left:Expr, right:Expr) extends Expr { def eval():Int = left.eval - right.eval } case class EMul(left:Expr, right:Expr) extends Expr { def eval():Int = left.eval * right.eval } case class EDiv(left:Expr, right:Expr) extends Expr { def eval():Int = left.eval / right.eval } case class EUMinus(e:Expr) extends Expr { def eval():Int = -e.eval } object ExprParser extends StandardTokenParsers { lexical.delimiters ++= List("+","-","*","/","(",")") def value = numericLit ^^ { s => EConst(s.toInt) } def parens:Parser[Expr] = "(" ~> expr <~ ")" def unaryMinus:Parser[EUMinus] = "-" ~> term ^^ { EUMinus(_) } def term = ( value | parens | unaryMinus ) def binaryOp(level:Int):Parser[((Expr,Expr)=>Expr)] = { level match { case 1 => "+" ^^^ { (a:Expr, b:Expr) => EAdd(a,b) } | "-" ^^^ { (a:Expr, b:Expr) => ESub(a,b) } case 2 => "*" ^^^ { (a:Expr, b:Expr) => EMul(a,b) } | "/" ^^^ { (a:Expr, b:Expr) => EDiv(a,b) } case _ => throw new RuntimeException("bad precedence level "+level) } } val minPrec = 1 val maxPrec = 2 def binary(level:Int):Parser[Expr] = if (level>maxPrec) term else binary(level+1) * binaryOp(level) def expr = ( binary(minPrec) | term ) def parse(s:String) = { val tokens = new lexical.Scanner(s) phrase(expr)(tokens) } def apply(s:String):Expr = { parse(s) match { case Success(tree, _) => tree case e: NoSuccess => throw new IllegalArgumentException("Bad syntax: "+s) } } def test(exprstr: String) = { parse(exprstr) match { case Success(tree, _) => println("Tree: "+tree) val v = tree.eval() println("Eval: "+v) case e: NoSuccess => Console.err.println(e) } } //A main method for testing def main(args: Array[String]) = test(args(0)) }In the
ExprParser
class, the lines up to and including the
definition of the expr
method define the parsing rules,
whereas the methods from parse
onwards are helper methods.
Calling Individual Parsers
In our example parser we can easily ask it to parse a string by calling ourExprParser.test
method, which parses the string using our
parse
method, prints the resulting parse, and
(if the parse was successful) evaluates the parse tree and prints that value.
The last line of
parse
parses a string using our expression parser:
phrase(expr)(tokens)
phrase
is a method in
StandardTokenParsers
that parses an input stream using the specified parser.
The only thing special about our expr
method is that we
happen to have selected it as our top-level parser -
but we could just as easily have picked one of our other parsers
as our top-level parser.
Let's add another version of the
test
method that lets us
specify which parser to use as the top-level parser.
We want to print out the results in the same way as for the existing
test
method, so we first
refactor that existing method:
def test(exprstr: String) = printParseResult(parse(exprstr)) def printParseResult(pr:ParseResult[Expr]) = { pr match { case Success(tree, _) => println("Tree: "+tree) val v = tree.eval() println("Eval: "+v) case e: NoSuccess => Console.err.println(e) } }Now we add a new
parse
method that accepts a parser as
an argument, and we call that from our new test
method:
def parse(p:Parser[Expr], s:String) = { val tokens = new lexical.Scanner(s) phrase(p)(tokens) } def test(p:Parser[Expr], exprstr: String) = printParseResult(parse(p,exprstr))We can run the Scala REPL, load our modified file using the ":load" command, then manually call the top-level parser by calling our
test
method.
To reduce typing, we import everything from ExprParser
.
In the examples below, text in bold is what we type,
the rest is printed by the REPL.
scala> import ExprParser._ import ExprParser._ scala> test("1+2") Tree: EAdd(EConst(1),EConst(2)) Eval: 3 scala> test("1+2*3") Tree: EAdd(EConst(1),EMul(EConst(2),EConst(3))) Eval: 7 scala> test("(1+2)*3") Tree: EMul(EAdd(EConst(1),EConst(2)),EConst(3)) Eval: 9We can also call the
test
method that takes a parser as an
argument, allowing us to specifically test one particular parsing rule
at a time.
If we pass in expr
as the parser, we will get the same
results as above;
but if we pass in a different parser, we may get different results.
scala> test(expr,"1+2*3") Tree: EAdd(EConst(1),EMul(EConst(2),EConst(3))) Eval: 7 scala> test(binary(1),"1+2*3") Tree: EAdd(EConst(1),EMul(EConst(2),EConst(3))) Eval: 7 scala> test(binary(2),"1+2*3") [1.2] failure: ``/'' expected but `+' found 1+2*3 ^ scala> test(parens,"1+2") [1.1] failure: ``('' expected but 1 found 1+2 ^ scala> test(parens,"(1+2)") Tree: EAdd(EConst(1),EConst(2)) Eval: 3 scala> test(parens,"(1+2)*3") [1.6] failure: end of input expected (1+2)*3 ^
Tracing
If you have a larger parser that is not behaving and you are not quite sure where the problem lies, it can be tedious to directly call individual parsers until you find which one is misbehaving. Being able to trace the progress of the whole parser running on an input known to cause the problem might be helpful, but sprinklingprintln
statements throughout your parser can be tricky.
This section provides an approach that allows you to do some tracing
with minimal changes to your code.
The output can get pretty verbose, but
at least this will give you a starting point from which you may be
able to devise your own improved debugging.
The idea behind this approach is to wrap some or all of the individual parsers in a debugging parser that delegates its
apply
action
to the wrapper parser, but that prints out some debugging information.
The apply
action is called during the act of parsing.
Note: this code relies on the fact that the code for the various combinators in the
Parser
class in Scala's
StandardTokenParsers
(which is implemented as an inner class in
scala.util.parsing.combinator.Parsers
)
does not override any Parser
method other than apply
.
This code could be added directly to the
ExprParser
class,
but it is presented here as a separate class to make it easier to reuse.
Add this DebugStandardTokenParsers
class
to the file containing ExprParsers
.
trait DebugStandardTokenParsers extends StandardTokenParsers { class Wrap[+T](name:String,parser:Parser[T]) extends Parser[T] { def apply(in: Input): ParseResult[T] = { val first = in.first val pos = in.pos val offset = in.offset val t = parser.apply(in) println(name+".apply for token "+first+ " at position "+pos+" offset "+offset+" returns "+t) t } } }The
Wrap
class provides the hook into the apply
method that we need in order to print out our trace information as the
parser runs.
Once this class is in place, we modify ExprParser
to
inherit from it rather than from StandardTokenParsers
:
object ExprParser extends DebugStandardTokenParsers { ... }So far we have not changed the behavior of the parser, since we have not yet wired in the
Wrap
class.
To do so, we can take any of the existing parsers and wrap it in a
new Wrap
.
For example, with the top-level expr
parser
we could do this,
with the added code highlighted in bold:
def expr = new Wrap("expr", ( binary(minPrec) | term ) )We can make this a bit easier to edit and read by using implicits. In
DebugStandardTokenParsers
we add this method:
implicit def toWrapped(name:String) = new { def !!![T](p:Parser[T]) = new Wrap(name,p) }Now we can wrap our
expr
method like this:
def expr = "expr" !!! ( binary(minPrec) | term )If you don't like using
!!!
as an operator, you are free
to pick something more to your taste, or you can leave out the implicit
and just use the new Wrap
approach.
At this point you must modify your source code by adding the above syntax to each parsing rule that you want to trace. You can go through and do them all, or you can just pick out the ones you think are the most likely culprits and wrap those. Note that you can wrap any parser this way, including those that appear as pieces in the middle of other parsers. The following example shows how some of the parsers in the
term
and binaryOp
methods can be wrapped:
def term = "term" !!! ( value | "term-parens" !!! parens | unaryMinus ) def binaryOp(level:Int):Parser[((Expr,Expr)=>Expr)] = { level match { case 1 => "add" !!! "+" ^^^ { (a:Expr, b:Expr) => EAdd(a,b) } | "sub" !!! "-" ^^^ { (a:Expr, b:Expr) => ESub(a,b) } case 2 => "mul" !!! "*" ^^^ { (a:Expr, b:Expr) => EMul(a,b) } | "div" !!! "/" ^^^ { (a:Expr, b:Expr) => EDiv(a,b) } case _ => throw new RuntimeException("bad precedence level "+level) } }Assuming we have wrapped the
expr
, term
and
binaryOp
methods as in the above examples, here is what the
output looks like for a few tests.
As in the previous REPL example, user input is in bold.
If you are using the REPL and reload the file, remember to
run import ExprParser._
again to pick up the
newer definitions.
scala> test("1") term.apply for token 1 at position 1.1 offset 0 returns [1.2] parsed: EConst(1) plus.apply for token EOF at position 1.2 offset 1 returns [1.2] failure: ``+'' expected but EOF found 1 ^ minus.apply for token EOF at position 1.2 offset 1 returns [1.2] failure: ``-'' expected but EOF found 1 ^ expr.apply for token 1 at position 1.1 offset 0 returns [1.2] parsed: EConst(1) Tree: EConst(1) Eval: 1 scala> test("(1+2)*3") term.apply for token 1 at position 1.2 offset 1 returns [1.3] parsed: EConst(1) plus.apply for token `+' at position 1.3 offset 2 returns [1.4] parsed: + term.apply for token 2 at position 1.4 offset 3 returns [1.5] parsed: EConst(2) plus.apply for token `)' at position 1.5 offset 4 returns [1.5] failure: ``+'' expected but `)' found (1+2)*3 ^ minus.apply for token `)' at position 1.5 offset 4 returns [1.5] failure: ``-'' expected but `)' found (1+2)*3 ^ expr.apply for token 1 at position 1.2 offset 1 returns [1.5] parsed: EAdd(EConst(1),EConst(2)) term-parens.apply for token `(' at position 1.1 offset 0 returns [1.6] parsed: EAdd(EConst(1),EConst(2)) term.apply for token `(' at position 1.1 offset 0 returns [1.6] parsed: EAdd(EConst(1),EConst(2)) term.apply for token 3 at position 1.7 offset 6 returns [1.8] parsed: EConst(3) plus.apply for token EOF at position 1.8 offset 7 returns [1.8] failure: ``+'' expected but EOF found (1+2)*3 ^ minus.apply for token EOF at position 1.8 offset 7 returns [1.8] failure: ``-'' expected but EOF found (1+2)*3 ^ expr.apply for token `(' at position 1.1 offset 0 returns [1.8] parsed: EMul(EAdd(EConst(1),EConst(2)),EConst(3)) Tree: EMul(EAdd(EConst(1),EConst(2)),EConst(3)) Eval: 9 scala> test(parens,"(1+2)") term.apply for token 1 at position 1.2 offset 1 returns [1.3] parsed: EConst(1) mul.apply for token `+' at position 1.3 offset 2 returns [1.3] failure: ``*'' expected but `+' found (1+2) ^ div.apply for token `+' at position 1.3 offset 2 returns [1.3] failure: ``/'' expected but `+' found (1+2) ^ add.apply for token `+' at position 1.3 offset 2 returns [1.4] parsed: + term.apply for token 2 at position 1.4 offset 3 returns [1.5] parsed: EConst(2) mul.apply for token `)' at position 1.5 offset 4 returns [1.5] failure: ``*'' expected but `)' found (1+2) ^ div.apply for token `)' at position 1.5 offset 4 returns [1.5] failure: ``/'' expected but `)' found (1+2) ^ add.apply for token `)' at position 1.5 offset 4 returns [1.5] failure: ``+'' expected but `)' found (1+2) ^ sub.apply for token `)' at position 1.5 offset 4 returns [1.5] failure: ``-'' expected but `)' found (1+2) ^ expr.apply for token 1 at position 1.2 offset 1 returns [1.5] parsed: EAdd(EConst(1),EConst(2)) Tree: EAdd(EConst(1),EConst(2)) Eval: 3As you can see, even for these very short input strings the output is pretty verbose. It does, however, show you what token it is trying to parse and where in the input stream that token is, so by paying attention to the position and offset numbers you can see where it is backtracking.
When you have found the problem and are done debugging, you can remove the
DebugStandardTokenParsers
class and take out all of the
!!!
wrapping operations, or you can leave everything in place
and disable the wrapper output by changing the
definition of the implicit !!!
operator to this:
def !!![T](p:Parser[T]) = pOr, if you want to make it possible to enable debugging output later, change
!!!
to return either p
or
new Wrap(p)
depending on some debugging configuration value.
Updated Example
Below is the complete program with all of the above changes.import scala.util.parsing.combinator.syntactical.StandardTokenParsers sealed abstract class Expr { def eval():Int } case class EConst(value:Int) extends Expr { def eval():Int = value } case class EAdd(left:Expr, right:Expr) extends Expr { def eval():Int = left.eval + right.eval } case class ESub(left:Expr, right:Expr) extends Expr { def eval():Int = left.eval - right.eval } case class EMul(left:Expr, right:Expr) extends Expr { def eval():Int = left.eval * right.eval } case class EDiv(left:Expr, right:Expr) extends Expr { def eval():Int = left.eval / right.eval } case class EUMinus(e:Expr) extends Expr { def eval():Int = -e.eval } trait DebugStandardTokenParsers extends StandardTokenParsers { class Wrap[+T](name:String,parser:Parser[T]) extends Parser[T] { def apply(in: Input): ParseResult[T] = { val first = in.first val pos = in.pos val offset = in.offset val t = parser.apply(in) println(name+".apply for token "+first+ " at position "+pos+" offset "+offset+" returns "+t) t } } implicit def toWrapped(name:String) = new { def !!![T](p:Parser[T]) = new Wrap(name,p) //for debugging //def !!![T](p:Parser[T]) = p //for production } } object ExprParser extends DebugStandardTokenParsers { lexical.delimiters ++= List("+","-","*","/","(",")") def value = numericLit ^^ { s => EConst(s.toInt) } def parens:Parser[Expr] = "(" ~> expr <~ ")" def unaryMinus:Parser[EUMinus] = "-" ~> term ^^ { EUMinus(_) } def term = "term" !!! ( value | "term-parens" !!! parens | unaryMinus ) def binaryOp(level:Int):Parser[((Expr,Expr)=>Expr)] = { level match { case 1 => "add" !!! "+" ^^^ { (a:Expr, b:Expr) => EAdd(a,b) } | "sub" !!! "-" ^^^ { (a:Expr, b:Expr) => ESub(a,b) } case 2 => "mul" !!! "*" ^^^ { (a:Expr, b:Expr) => EMul(a,b) } | "div" !!! "/" ^^^ { (a:Expr, b:Expr) => EDiv(a,b) } case _ => throw new RuntimeException("bad precedence level "+level) } } val minPrec = 1 val maxPrec = 2 def binary(level:Int):Parser[Expr] = if (level>maxPrec) term else binary(level+1) * binaryOp(level) def expr = "expr" !!! ( binary(minPrec) | term ) def parse(s:String) = { val tokens = new lexical.Scanner(s) phrase(expr)(tokens) } def parse(p:Parser[Expr], s:String) = { val tokens = new lexical.Scanner(s) phrase(p)(tokens) } def apply(s:String):Expr = { parse(s) match { case Success(tree, _) => tree case e: NoSuccess => throw new IllegalArgumentException("Bad syntax: "+s) } } def test(exprstr: String) = printParseResult(parse(exprstr)) def test(p:Parser[Expr], exprstr: String) = printParseResult(parse(p,exprstr)) def printParseResult(pr:ParseResult[Expr]) = { pr match { case Success(tree, _) => println("Tree: "+tree) val v = tree.eval() println("Eval: "+v) case e: NoSuccess => Console.err.println(e) } } //A main method for testing def main(args: Array[String]) = test(args(0)) }
5 comments:
Thanks. Actually the fact that the toString() method on failure outputs something useful was very good. You could consider changing the parse method so that when it throws IllegalArgumentException it includes e.toString in the message.
Turns out my issue was related to using rep1sep - a warning for those that follow is that rep1sep will still match even if the separator is not present at all, it just returns a 1-element list.
Thus if you are using rep1sep with | to provide alternatives your alternatives will never match.
i.e. rep1sep(term, "*") ... | rep1sep(term, "/") will never check for a "/" if it finds a single term.
The solution is to make your own version of rep1sep that requires at least one instance of the separator and put term as the fallback.
Thanks. I'm just learning Scala and one of the first things I want to do is port an existing parser I wrote in F#. This blog post should help a lot.
That's a thoughtful post and you've apparently spent quite some time writing Scala parsers, but I'm surprised you haven't mentioned the "log" parser- its output has helped me on not one occasion:
http://stackoverflow.com/questions/2387892/parser-combinator-not-terminating-how-to-log-what-is-going-on
Vassil Dichev: An embarrassing oversight on my part! Thanks for pointing it out.
Readers, if you are happy with the format of the standard log parser, you can throw out the entire DebugStandardTokenParsers class and just add this little method to your parser class (ExprParser in my example) to enable the use of the !!! method:
implicit def toLogged(name:String) = new {
def !!![T](p:Parser[T]) = log(p)(name)
}
My approach with DebugStandardTokenParsers allows for a bit of customization of the logging output, but in most situations I suspect that would not be necessary, and the much simpler expedient of adding the above implicit (or just using the log parser directly) would be satisfactory and far simpler.
Post a Comment