Home | Download | News | Wiki | About ANTLR | Feedback | Support | Bugs


Latest version is 2.7.7.
Download now! »

Download
» Home
» Download
» News
»Using ANTLR
» Documentation
» Wiki
» FAQ
» Articles
» Grammars
» File Sharing
» Code API
» Tech Support
» Bug Tracking
»About ANTLR
» What is ANTLR
» Why use ANTLR
» Showcase
» Testimonials
» Getting Started
» Software License
» ANTLR WebLogs
» ANTLR Workshops
»StringTemplate
»TML
»PCCTS
»Feedback
»Credits
»Contact


Support StringTemplate, ANTLR Project by making a donation! Terence often pays for things like the antlr.org server, conference travel, and this site design (that alone cost US$1000). Buy him a beer and pizza remotely ;)

Search



Grammar Reuse

RSS Feed

May 15, 2007

I'm starting to think about how to support reusing grammars. ANTLR v2 had an inheritance model, but it had to fundamental flaws:

  1. inheritance was implemented as a preprocessor and so the relationship between the expanded grammar and the original source grammar was not very clear.
  2. the generated code included all rules from a superclass as well as the subclass, which resulted in some truly huge output files.

Both of these made it very difficult to debug grammars. In many cases the generated parsers were too big to debug. For example, one user told me that there generated Oracle 10g parser was 129k lines!

The primary advantages of reusing grammars are the same advantages obtained from reusing code. You get single point of change and you can describe a new problem only as it differs from the previous solution. Using the inheritance model of ANTLR v2, one company gave me the following information:

  • Oracle8.1.6 grammar, 9400 lines
  • 9i subgrammar, 4000 lines
  • 10g subgrammar, 2000 lines

There is a great deal of rule sharing as you can see. Inheritance works great when you are adding new constructs. You just can either add to or override rules from the supergrammar. The gets much harder when you try to alter alternatives within a rule or even the middle of an alternative. Depending on how you structure your super grammar initially, you may have to modify the super grammar extensively. Doing this, risks breaking a lot of things in a complicated grammar. Sometimes it is better to cut and paste rather than munge supergrammar.

Tree grammar versus parser grammar reuse. The company only used inheritance for the parser, not the tree parser. They had one tree parser per vendor. All versions of the SQL were normalized by the parser into a single tree structure per vendor, hence, one tree grammar per vendor.

Actions. The company did not have many actions in the parser, preferring to simply build a tree. All of the actions are in the tree parsers.

Fixing inheritance model. One of the most obvious solutions is simply to make grammar inheritance rely on class inheritance so that some grammars result in generated code commensurate with their size. This is in contrast to the previous solution that treated inheritance like a glorified include. Another obvious thing to add is the ability to add an alternative to an existing rule without having to cut and paste the entire rule:

grammar MyJava : Java; // inherit from Java grammar

primitiveType += 'imaginary' ; // add imag. numbers

or, perhaps:

primitiveType
	:	super
	|	'imaginary'
	;

Sometimesyou need to replace a single alternative from the superclass. Rather than altering the superclass to reference a rule so that you can override that rule, perhaps naming the alternatives is a good idea like Rats! does.

grammar T;
a : blah
  |<label> blort
  | ick
  ;

Then a some grammar can override that alternative:

grammar U : T;
a.label : newVersionOfBlort ;

Hmm...that ':' as the inheritance operator is bad because it overloads the rule definition operator. Anyway, and you could also remove alternatives

a -= <label> ; // delete this alternative.

Delegation model. Perhaps a better and more general solution is to use delegation. This would work like multiple inheritance really. Any rule not defined in the current grammar would be looked up in the delegates.

The delegation model is nice because it allows you to break up a big SQL supergrammar into chunks so you can share the common stuff among more vendors and override what you need. This is more like the grammar composition from tools like Rats!. For example, in SQL there are logical chunks. You could delegate on expr, predicates, conditions, data manipulation language statements, data definition language statements, etc.

grammar Oracle10g;

import OracleExpressions;
import OracleDataDefinition;
import OracleConditions;
import OracleStatements;

...rules...

Hmm...I like the word delegate better than import, but import makes a little more sense. The import statement makes all of the rules within the specified grammar visible to the rules within the current grammar. Looking up symbols would be such that ambiguous rule references would be illegal. You would need to fully qualify them.

grammar A;
a : ... ;
c : ... ;
grammar B;
a : ... ;
b : ... ;
grammar T;

import A;
import B;

x : A.a B.a ; // reference rules from two imported grammars
y : b ;       // reference unambiguous rule B.b
c : ... ;     // override rule A.c
z : c ;       // c refers to overridden rule T.c

Note that dynamic binding would ensure that things work as you expect:

grammar A;

a : b ;
b : B ;
grammar B;
import A;
b : C ; // override A.b

Here, if you call rule 'a' in grammar B, it will grab the definition from A which we'll call rule 'b'. Because you invoked rule 'a' from a B grammar, rule 'a' will invoke B.b not A.b.

Lexer grammar reuse. lexers can be shared 90% or more.

Macros. What about having macros for expressions and so on? Nah, seems and unnecessary complication at this point.

Misc. Having a nice way to make changes other than at the course rule level granularity, would be great. Apparently the changes between Oracle SQL 8, 9, 10 are not additions. They are changes, about 15% between versions.

We need to be careful that we do not mark up the subgrammars so much that they are impossible to read.