Symbolic Differentiation Part I

Terry Brugger

This entry in the field guide presents the Tutorial from Terence's Book ported into ANTLR 2. It is written from my experience as a developer who was already familiar with Java and parsing. It assumes that you have a copy of Terence's Book, as I will refer directly to the pages therein.

You will note as you go through the tutorial, that ANTLR just seems to glob the parser and the lexer together. This has changed in ANTLR 2. The two are much better defined and separated now. While the approach used in ANTLR 1 had merit for its simplicity, the separation in ANTLR 2 allows for the use of a lexer which acts on LL(k) grammars, which means it's much more powerful than the DLG lexer from ANTLR 1.

We start, after the introduction, with the rules. The interp rule remains unchanged. The poly rule has only a minor modification:

poly
	: term ( "+" term )*
	;

notice that the + is no longer escaped, as it is no longer a control character inside a string. The other rules remain unchanged as well. (This is easy so far, isn't it?) On page 44, we begin to get into the bigger changes:

term
	: coefficient ( reg ( "^" exp )? )?	// merged alts 1 and 4
	| reg ( "^" exp )?			// merged alts 2 and 3
	;

as (..)? has replaced {..} for defining optional subrules. We now have seven tokens:

Regular Expression	Description
"="	The equals sign
";"	Semicolon
"+"	Plus sign
"^"	Exponentiation operator
( '0'..'9' )+ ( '.' ( '0'..'9' )* )?	FLOAT - Note that I took the liberty to allow numbers to be specified as "123." with a trailing decimal point and no numbers behind it.
'a'..'z'	ID
' ' \| '\t' \| '\r' '\n' \| '\n'	white space

It should be noted that the first four are treated just as strings. The parser detects that they are strings and passes them into the lexer's literal table. When the lexer encounters one of these, it does a lookup on said table, just as it would for a keyword like "if" or "for". While this is not the most efficient way to do it, Terence is working on allowing single characters in the parser, which will translate into individual tokens in the lexer. In the meantime, you can emulate this behaviour by writing rules like:

Poly
	: term ( PLUS term )*
	;

and then defining PLUS in the lexer:

PLUS : '+';

For simplicity, I have opted not to write the rules in this manner. Instead, I have included a rule encapsulating all the operators in the lexer. This will allow the lexer to recognize the operators; once it does, the lookup and match will be performed. The rules is:

OPERATORS
	: '='
	| ';'
	| '+'
	| ';'
	;

Our final grammar looks like:

// Poly.g
// Defines a grammar for a Polynomial Parser
class PolyParser extends Parser;

// The entry point for an interpreter
interp
	: ( ID "=" poly ";" )+
	;

// A polynomial
poly
	: term ( "+" term )*
	;

// A term in the polynomial
term
	: coefficient ( reg ( "^" exp )? )?
	| reg ( "^" exp )?
	;

// The coefficient of the term
coefficient
	: FLOAT
	;

// A register or variable
reg
	: ID
	;

// An exponent in a term
exp
	: reg
	| FLOAT
	;

class PolyLex extends Lexer;

// An ID for a register
ID
	: 'a'..'z'
	;

// A floating point number
FLOAT
	: ( '0'..'9' )+ ( '.' ( '0'..'9' )* )?
	;

// The operators so that they are recognized
OPERATORS
	: '='
	| '^'
	| '+'
	| ';'
	;

// Eat the white space
WS
	: ( ' '
	  | '\t'
	  | '\r' '\n'	{ newline(); }
	  | '\n'	{ newline(); }
	  )
		/* All the characters are in a subrule so that this
		   action will apply to them: */
		{ $setType(Token.SKIP); } // ignore this token
	;

And in another file:

// PolyMain.java
// constructs and executes our Polynomial Parser
import java.awt.*;
import java.io.*;

class PolyMain
{
	public static void main(String[] args)
	{
		try {
			PolyLex lexer = new PolyLex(new DataInputStream(System.in));
			PolyParser parser = new PolyParser(lexer);
			parser.interp();
		} catch(Exception exception) {
			System.err.println("exception: "+exception);
		} // end try
	} // end main()
} // end PolyMain

Then from the command line, create the Parser and Lexer classes:
> java antlr.Tool Poly.g

Compile everything:

> javac *.java

And you can run it:

> java PolyMain

Enter any input and press <Ctrl>-Z when you're done. Note that all processing occurs after you press <Ctrl>-Z, so you won't see each line parsed after you hit <Enter>.

Semantic Actions

This is where more changes start to pop up. Starting with the new poly rule on p. 47:

poly returns [double ret]
	{ double dbl=0.0; ret = -2.0; }
	: ret=term ( "+" dbl=term { ret += dbl; } )*
	;

Instead of using '>' to denote a return variable, ANTLR 2 uses the keyword "returns". The float has changed to a double because Java is nicer to doubles than floats. This will become particularly important when we call the exponentiation function because it is defined in java.lang.Math as double pow(double, double) ; and of course, when dealing with exponentiation, numbers can get very big very quickly. Both doubles are initialized to suppress Java's warnings about using uninitialized variables (apparently because they are initialized by ANTLR inside of an exception handler. I initialized ret to -2 to aid in debugging because it should always be set to a positive number first thing in the rule. If I see a -2 pop up somewhere, I know I have a problem and where to start looking for it. Instead of using f or d as the variable name, I have chosen ret and dbl, simply due to an aversion of one character variable names. The next point to note is that the initialization statements are now enclosed in curly braces instead of double angle brackets; furthermore, this init-action is located before the first ":" rather than after it. Inside the rule itself, the syntax has been vastly improved. Instead of redirection arrows, square brackets and "$"s (reminds me of perl) for return values, we have simple "=" operators and use the variable names directly. The caveat of this is that the variable being assigned to is now on the left hand side (consistent with Java syntax). Again, actions are in curly braces and not angled brackets. We continue now with the rule for interp:

interp
	{ double ret=-1.0; }
	: ( lhs:ID "=" ret=poly ";"
		{ regs.store(lhs.getText(), ret); }
	  )+
	;

Rather than noting a change here, it should be pointed out that the ":" token assignment syntax has not changed. Keep in mind that now both the "=" and ":" assignments use lhs is assigned to and rhs is assigned from. The only notable change here that has not already been discussed is that the store method is in an object named "regs" this will be discussed momentarily. The rule for a coefficient is:

coefficient returns [double ret]
	{ ret = -4.0; }
	: flt:FLOAT
		{ ret = (new Double(flt.getText())).doubleValue(); }
	;

The main change here is in the way that the token value is converted into a double. We construct a new Double using the text of the FLOAT the lexer read in. We then store this Double in the ret, which is just a double, using the doubleValue() method. This demonstrates one of my few frustrations with Java and something that newer Java programmers should watch out for: an over dependence on case sensitivity. The next two rules are simple:

reg returns [double ret]
	{ ret = -5.0; }
	: id:ID
		{ ret = regs.value(id.getText()); }
	;

exp returns [double ret]
	{ ret = -6.0; }
	: ret=reg
	| flt:FLOAT
		{ ret = (new Double(flt.getText())).doubleValue(); }
	;

Any complexity in term is covered on page 48. The conversion from ANTLR 1 to 2 introduces no complexities which we have not already discussed:

term returns [double ret]
	{ double dbl=0.0, pow=0.0, coeff=0.0, var=0.0; ret = -3.0; }
	: coeff=coefficient
	  ( var=reg
	    ( "^" pow=exp	{ ret = coeff * java.lang.Math.pow(var, pow); }
	    |			{ ret = coeff * var; }
	    )
	  |			{ ret = coeff; }
	  )
	| dbl=reg
	  ( "^" pow=exp		{ ret = java.lang.Math.pow(dbl, pow); }
	  |			{ ret = dbl; }
	  )
	;

We can, however, vastly simplify this rule through the use of default values for the variables (that is our local variables in java, not the variables in our grammar):

term returns [double ret]
	{ double pow=1.0, coeff=0.0, var=1.0; ret = -3.0; }
	: coeff=coefficient ( var=reg ( "^" pow=exp )? )?
		{ ret = coeff * java.lang.Math.pow(var, pow); }
	| var=reg ( "^" pow=exp )?
		{ ret = java.lang.Math.pow(var, pow); }
	;

Here is where our regs object comes in useful. Instead of implementing the store and value functions inside our parser, we implement them inside an object which implements the DblStorage interface:

// DblStorage.java
// Defines an interface for objects which allow the user to store doubles by a one
// character name
interface DblStorage
{
	void store(String reg, double val);
	double value(String reg);
}

And here is that object:

// Registers.java
// implements the store and value behaviour functions using an array

import java.io.*;

class Registers implements DblStorage
{
	private double array[];

	public Registers()
	{
		int size='z'-'a'+1;
		array = new double[size];
		for(int lcv=0; lcv<size; lcv++) array[lcv] = 0.0;
	} // end default constructor

	public void store(String reg, double val)
	{
		array[reg.charAt(0)-'a'] = val;
		System.out.println("Storing " + val + " in " + reg);
	} // end store()

	public double value(String reg)
	{
		return array[reg.charAt(0)-'a'];
	} // end value()
} // end class Registers

We now add a constructor to PolyParser which initializes the regs member variable to an object of the DblStorage interface:

class PolyParser extends Parser;
{
	private DblStorage regs;
	PolyParser(TokenBuffer tokenBuf, DblStorage regs)
	{
		this(tokenBuf);
		this.regs = regs;
	} // end specialized constructor
}

We must now modify the main program to use this new object:

// PolyMain.java
// constructs and executes our Polynomial Parser
import java.awt.*;
import java.io.*;

class PolyMain
{
	public static void main(String[] args)
	{
		try {
			Registers regs = new Registers();
			PolyLex lexer = new PolyLex(new DataInputStream(System.in));
			PolyParser parser = new PolyParser(lexer, regs);
			parser.interp();
		} catch(Exception exception) {
			System.err.println("exception: "+exception);
		} // end try
	} // end main()
} // end PolyMain

Here is the complete grammar:

// Poly.g
// Defines a grammar for a Polynomial Parser
class PolyParser extends Parser;
{
	private DblStorage regs;
	PolyParser(Tokenizer tokenBuf, DblStorage regs)
	{
		this(tokenBuf);
		this.regs = regs;
	} // end specialized constructor
}

// The entry point for an interpreter
interp
	{ double ret=-1.0; }
	: ( lhs:ID "=" ret=poly ";"
		{ regs.store(lhs.getText(), ret); }
	  )+
	;

// A polynomial
poly returns [double ret]
	{ double dbl=0.0; ret = -2.0; }
	: ret=term ( "+" dbl=term { ret += dbl; } )*
	;

// A term in the polynomial
term returns [double ret]
	{ double pow=1.0, coeff=0.0, var=1.0; ret = -3.0; }
	: coeff=coefficient ( var=reg ( "^" pow=exp )? )?
		{ ret = coeff * java.lang.Math.pow(var, pow); }
	| var=reg ( "^" pow=exp )?
		{ ret = java.lang.Math.pow(var, pow); }
	;

// The coefficient of the term
coefficient returns [double ret]
	{ ret = -4.0; }
	: flt:FLOAT
		{ ret = (new Double(flt.getText())).doubleValue(); }
	;

// A register or variable
reg returns [double ret]
	{ ret = -5.0; }
	: id:ID
		{ ret = regs.value(id.getText()); }
	;

// An exponent in a term
exp returns [double ret]
	{ ret = -6.0; }
	: ret=reg
	| flt:FLOAT
		{ ret = (new Double(flt.getText())).doubleValue(); }
	;

class PolyLex extends Lexer;

// An ID for a register
ID
	: 'a'..'z'
	;

// A floating point number
FLOAT
	: ( '0'..'9' )+ ( '.' ( '0'..'9' )* )?
	;

// The operators so that they are recognized
OPERATORS
	: '='
	| '^'
	| '+'
	| ';'
	;

// Eat the white space
WS
	: ( ' '
	  | '\t'
	  | '\r' '\n'	{ newline(); }
	  | '\n'	{ newline(); }
	  )
		/* All the characters are in a subrule so that this
		   action will apply to them: */
		{ $setType(Token.SKIP); } // ignore this token
	;