|
Home |
Download |
News |
Wiki |
About ANTLR |
Feedback |
Support |
Bugs
|
|
|
Latest version is 2.7.7. Download now! » |
|
|
Debugging and Testing Grammars With Parse Trees and Derivations Terence Parr University of San Francisco parrt@cs.usfca.edu November 30, 2003 ANTLR 2.7.3 now includes the classes described here
Introduction and problem definitionDebugging big grammars is very difficult because the Human mind cannot easily trace the highly nested and recursive nature of parsing. When the wrong thing happens during translation, you first have to figure out what actions are being executed in the grammar. To figure that out, you have to know what rules are being executed. Large grammars with large input can make this pretty difficult without a lot of experience. ANTLR provides a simple traceIn/traceOut facility so that you can watch the rule method entry/exit events, but there is so much output and it is so deeply nested that it's hard to figure out what's going on; particularly when the parser is guessing. You can use a debugger to step through the parser, but you have a similar Human brain limitation problem. Let's look at a real problem. Consider some simple input:
for a tiny C grammar like this:
When you turn on trace option
You see this:
Not particularly useful. This purpose of this article is to describe
a new trivial mechanism that will build parse trees and most
importantly derivation sequences; all without modifying ANTLR! For
input "int i;", you want:
which is very useful. For each rule invocation, you can see how the input is gradually matched in a leftmost rule-replacement fashion. Automatically generating parse treesIn ANTLR 2.7.2 I noticed, to my embarrassment, that the superclass
feature was broken, which I've fixed for 2.7.3. You will be able to
simply change your parser spec to include the
ParseTreeDebugParser class as the
immediate superclass:
For now, you'll have to cut/paste the definitions of the match() method variants and traceIn()/traceOut() into a grammar action to override the defaults inherited from antlr.LLkParser as I've done in the example grammar. Either way, when you use the -traceParser command-line option, the
parser will build parse trees with nodes of type ParseTreeToken and
ParseTreeRule. You can access the parse tree as follows:
Running the main program with input "int i;" will yield:
To get the full leftmost (that is, replacing the leftmost rule at each
step) derivation, add the following:
For input:
you'll see a derivation sequence like this:
If you only want the ith derivation step (counting from 0), use
The creation of these parse trees does not affect normal AST tree building routines. This mechanism is a strict augmentation of current functionality. Derivations as testing harnessMaking changes to a big grammar is pretty scary as you can't always predict the affect you'll have. Changing one rule can break a lot of test cases. Nomrally, a parser returns a yes or no answer as to whether the input was matched successfully, which is not a very good testing mechanism since an empty main program always returns true. Loring Craymer pointed out that once you have the derivation steps (or parse tree since they are equivalent), you can easily build unit tests that exercise your grammar providing an excellent means of detecting changes in the way input is matched. More importantly, it will tell you how the input is matched differently. You can go into the grammar and then see the new path implied by the new derivation. So, to build unit tests, run your grammar on several input sequences and record the derivations it spits out. Future runs of the same input sequence should give you an exact (to the character) replica of the derivation. If String.equals() fails, you know you've screwed up the grammar and where. By saving every modification to a grammar (as Monty Zukowski does) in a revision control system (like Perforce), you can easily back up to previous version to determine how you've changed/broken the grammar. ImplementationBy overriding parser matching infrastructure like match(token-type) and tracking rule method entry and exit events, you are able to build parse trees without modifying ANTLR itself. The parse tree construction is turned on and off at the ANTLR command line via the -traceParser option. Parse trees record the sequence of rules used to match a particular input sequence. They differ from ASTs in that parse trees are sensitive to grammar changes because they are a trace of the rule method call stack. Further, ASTs consist entirely of token nodes whereas parse trees have token nodes at the leaves and all internal nonleaf nodes are rule nodes.
As a final note, technically you could employ this exact implementation trick to build parse trees for lexers as well. The leaf nodes would be characters instead of tokens, that's all. |
|||||||||||||||||||||||||||||||||||||||||||||||