
Symbolic Differentiation Part I Terry Brugger
[email: zow@acm.org] This entry in the field guide presents the Tutorial from Terence's Book ported into ANTLR 2. It is written from my experience as a developer who was already familiar with Java and parsing. It assumes that you have a copy of Terence's Book, as I will refer directly to the pages therein. You will note as you go through the tutorial, that ANTLR just seems to glob the parser and the lexer together. This has changed in ANTLR 2. The two are much better defined and separated now. While the approach used in ANTLR 1 had merit for its simplicity, the separation in ANTLR 2 allows for the use of a lexer which acts on LL(k) grammars, which means it's much more powerful than the DLG lexer from ANTLR 1. We start, after the introduction, with the rules. The poly : term ( "+" term )* ; notice that the + is no longer escaped, as it is no longer a control character inside a string. The other rules remain unchanged as well. (This is easy so far, isn't it?) On page 44, we begin to get into the bigger changes: term : coefficient ( reg ( "^" exp )? )? // merged alts 1 and 4  reg ( "^" exp )? // merged alts 2 and 3 ; as
It should be noted that the first four are treated just as strings. The parser detects that they are strings and passes them into the lexer's literal table. When the lexer encounters one of these, it does a lookup on said table, just as it would for a keyword like "if" or "for". While this is not the most efficient way to do it, Terence is working on allowing single characters in the parser, which will translate into individual tokens in the lexer. In the meantime, you can emulate this behaviour by writing rules like: Poly : term ( PLUS term )* ; and then defining PLUS in the lexer: PLUS : '+'; For simplicity, I have opted not to write the rules in this manner. Instead, I have included a rule encapsulating all the operators in the lexer. This will allow the lexer to recognize the operators; once it does, the lookup and match will be performed. The rules is: OPERATORS : '='  ';'  '+'  ';' ; Our final grammar looks like: // Poly.g // Defines a grammar for a Polynomial Parser class PolyParser extends Parser; // The entry point for an interpreter interp : ( ID "=" poly ";" )+ ; // A polynomial poly : term ( "+" term )* ; // A term in the polynomial term : coefficient ( reg ( "^" exp )? )?  reg ( "^" exp )? ; // The coefficient of the term coefficient : FLOAT ; // A register or variable reg : ID ; // An exponent in a term exp : reg  FLOAT ; class PolyLex extends Lexer; // An ID for a register ID : 'a'..'z' ; // A floating point number FLOAT : ( '0'..'9' )+ ( '.' ( '0'..'9' )* )? ; // The operators so that they are recognized OPERATORS : '='  '^'  '+'  ';' ; // Eat the white space WS : ( ' '  '\t'  '\r' '\n' { newline(); }  '\n' { newline(); } ) /* All the characters are in a subrule so that this action will apply to them: */ { $setType(Token.SKIP); } // ignore this token ; And in another file: // PolyMain.java // constructs and executes our Polynomial Parser import java.awt.*; import java.io.*; class PolyMain { public static void main(String[] args) { try { PolyLex lexer = new PolyLex(new DataInputStream(System.in)); PolyParser parser = new PolyParser(lexer); parser.interp(); } catch(Exception exception) { System.err.println("exception: "+exception); } // end try } // end main() } // end PolyMain Then from the command line, create the Parser and Lexer classes: > java antlr.Tool Poly.g Compile everything: > javac *.java And you can run it: > java PolyMain Enter any input and press <Ctrl>Z when you're done. Note that all processing occurs after you press <Ctrl>Z, so you won't see each line parsed after you hit <Enter>.
Semantic ActionsThis is where more changes start to pop up. Starting with the new poly returns [double ret] { double dbl=0.0; ret = 2.0; } : ret=term ( "+" dbl=term { ret += dbl; } )* ; Instead of using '>' to denote a return variable, ANTLR 2 uses the keyword "returns". The float has changed to a double because Java is nicer to doubles than floats. This will become particularly important when we call the exponentiation function because it is defined in interp { double ret=1.0; } : ( lhs:ID "=" ret=poly ";" { regs.store(lhs.getText(), ret); } )+ ; Rather than noting a change here, it should be pointed out that the ":" token assignment syntax has not changed. Keep in mind that now both the "=" and ":" assignments use lhs is assigned to and rhs is assigned from. The only notable change here that has not already been discussed is that the store method is in an object named "regs" this will be discussed momentarily. The rule for a coefficient is: coefficient returns [double ret] { ret = 4.0; } : flt:FLOAT { ret = (new Double(flt.getText())).doubleValue(); } ; The main change here is in the way that the token value is converted into a double. We construct a new Double using the text of the FLOAT the lexer read in. We then store this Double in the ret, which is just a double, using the doubleValue() method. This demonstrates one of my few frustrations with Java and something that newer Java programmers should watch out for: an over dependence on case sensitivity. The next two rules are simple: reg returns [double ret] { ret = 5.0; } : id:ID { ret = regs.value(id.getText()); } ; exp returns [double ret] { ret = 6.0; } : ret=reg  flt:FLOAT { ret = (new Double(flt.getText())).doubleValue(); } ; Any complexity in term returns [double ret] { double dbl=0.0, pow=0.0, coeff=0.0, var=0.0; ret = 3.0; } : coeff=coefficient ( var=reg ( "^" pow=exp { ret = coeff * java.lang.Math.pow(var, pow); }  { ret = coeff * var; } )  { ret = coeff; } )  dbl=reg ( "^" pow=exp { ret = java.lang.Math.pow(dbl, pow); }  { ret = dbl; } ) ; We can, however, vastly simplify this rule through the use of default values for the variables (that is our local variables in java, not the variables in our grammar): term returns [double ret] { double pow=1.0, coeff=0.0, var=1.0; ret = 3.0; } : coeff=coefficient ( var=reg ( "^" pow=exp )? )? { ret = coeff * java.lang.Math.pow(var, pow); }  var=reg ( "^" pow=exp )? { ret = java.lang.Math.pow(var, pow); } ; Here is where our // DblStorage.java // Defines an interface for objects which allow the user to store doubles by a one // character name interface DblStorage { void store(String reg, double val); double value(String reg); } And here is that object: // Registers.java // implements the store and value behaviour functions using an array import java.io.*; class Registers implements DblStorage { private double array[]; public Registers() { int size='z''a'+1; array = new double[size]; for(int lcv=0; lcv<size; lcv++) array[lcv] = 0.0; } // end default constructor public void store(String reg, double val) { array[reg.charAt(0)'a'] = val; System.out.println("Storing " + val + " in " + reg); } // end store() public double value(String reg) { return array[reg.charAt(0)'a']; } // end value() } // end class Registers We now add a constructor to class PolyParser extends Parser; { private DblStorage regs; PolyParser(TokenBuffer tokenBuf, DblStorage regs) { this(tokenBuf); this.regs = regs; } // end specialized constructor } We must now modify the main program to use this new object: // PolyMain.java // constructs and executes our Polynomial Parser import java.awt.*; import java.io.*; class PolyMain { public static void main(String[] args) { try { Registers regs = new Registers(); PolyLex lexer = new PolyLex(new DataInputStream(System.in)); PolyParser parser = new PolyParser(lexer, regs); parser.interp(); } catch(Exception exception) { System.err.println("exception: "+exception); } // end try } // end main() } // end PolyMain Here is the complete grammar: // Poly.g // Defines a grammar for a Polynomial Parser class PolyParser extends Parser; { private DblStorage regs; PolyParser(Tokenizer tokenBuf, DblStorage regs) { this(tokenBuf); this.regs = regs; } // end specialized constructor } // The entry point for an interpreter interp { double ret=1.0; } : ( lhs:ID "=" ret=poly ";" { regs.store(lhs.getText(), ret); } )+ ; // A polynomial poly returns [double ret] { double dbl=0.0; ret = 2.0; } : ret=term ( "+" dbl=term { ret += dbl; } )* ; // A term in the polynomial term returns [double ret] { double pow=1.0, coeff=0.0, var=1.0; ret = 3.0; } : coeff=coefficient ( var=reg ( "^" pow=exp )? )? { ret = coeff * java.lang.Math.pow(var, pow); }  var=reg ( "^" pow=exp )? { ret = java.lang.Math.pow(var, pow); } ; // The coefficient of the term coefficient returns [double ret] { ret = 4.0; } : flt:FLOAT { ret = (new Double(flt.getText())).doubleValue(); } ; // A register or variable reg returns [double ret] { ret = 5.0; } : id:ID { ret = regs.value(id.getText()); } ; // An exponent in a term exp returns [double ret] { ret = 6.0; } : ret=reg  flt:FLOAT { ret = (new Double(flt.getText())).doubleValue(); } ; class PolyLex extends Lexer; // An ID for a register ID : 'a'..'z' ; // A floating point number FLOAT : ( '0'..'9' )+ ( '.' ( '0'..'9' )* )? ; // The operators so that they are recognized OPERATORS : '='  '^'  '+'  ';' ; // Eat the white space WS : ( ' '  '\t'  '\r' '\n' { newline(); }  '\n' { newline(); } ) /* All the characters are in a subrule so that this action will apply to them: */ { $setType(Token.SKIP); } // ignore this token ; 