History

Updated January 28, 2005

The PCCTS project began as a parser-generator project for a graduate course at Purdue University in the Fall of 1988 taught by Hank Dietz--“translator-writing systems”. Under the guidance of Professor Dietz, the parser generator, ANTLR (originally called YUCC), continued after the termination of the course and eventually became the subject of Terence Parr’s Master’s thesis. Originally, lexical analysis was performed via a simple scanner generator which was soon replaced by Will Cohen’s DLG in the Fall of 1989 (DFA-based lexical-analyzer generator, also an offshoot of the graduate translation course).

The alpha version of ANTLR was totally rewritten resulting in 1.00B. Version 1.00B was released via an internet newsgroup (comp.compilers) posting in February of 1990 and quickly gathered a large following. 1.00B generated only LL(1) parsers, but allowed the merged description of lexical and syntactic analysis. It had rudimentary attribute handling similar to that of YACC and did not incorporate rule parameters or return values; downward inheritance was very awkward. 1.00B-generated parsers terminated upon the first syntax error. Lexical classes (modes) were not allowed and DLG did not have an interactive mode.

Upon starting his Ph.D. at Purdue in the Fall of 1990, Terence Parr began the second total rewrite of ANTLR. The method by which grammars may be practically analyzed to generate LL(k) lookahead information was discovered in August of 1990 just before Terence’s return to Purdue. Version 1.00 incorporated this algorithm and included the AST mechanism, lexical classes, error classes, and automatic error recovery; code quality and portability were higher. In February of 1992 1.00 was released via an article in SIGPLAN Notices. Peter Dahl, then Ph.D. candidate, and Professor Matt O’Keefe (both at the University of Minnesota) tested this version extensively. Dana Hoggatt (Micro Data Base Systems, Inc.) tested 1.00 heavily.

Version 1.06 was released in December 1992 and represented a large feature enhancement over 1.00. For example, rudimentary semantic predicates were introduced, error messages were significantly improved for k>1 lookahead and ANTLR parsers could indicate that lookahead fetches were to occur only when necessary for the parse (normally, the lookahead “pipe” was constantly full). Russell Quong joined the project in the Spring of 1992 to aid in the semantic predicate design. Beginning and advanced tutorials were created and released as well. A makefile generator was included that sets up dependencies and such correctly for ANTLR and DLG. Very few 1.00 incompatibilities were introduced (1.00 was quite different from 1.00B in some areas).

Version 1.10 was released on August 31, 1993 after Terence’s release from Purdue and incorporated bug fixes, a few feature enhancements and a major new capability--an arbitrary lookahead operator (syntactic predicate), “(a)?b”. This feature was codesigned with Professor Russell Quong also at Purdue. To support infinite lookahead, a preprocessor flag, ZZINF_LOOK, was created that forced the ANTLR() macro to tokenize all input prior to parsing. Hence, at any moment, an action or predicate could see the entire input sentence. The predicate mechanism of 1.06 was extended to allow multiple predicates to be hoisted; the syntactic context of a predicate could also be moved along with the predicate.

In February of 1994, SORCERER was released. This tool allowed the user to parse child-sibling trees by specifying a grammar rather than building a recursive-descent tree walker by hand. Aaron Sawdey at The University of Minnesota became a second author of SORCERER after the initial release. On April 1, 1994, PCCTS 1.20 was released. This was the first version to actively support C++ output. It also included important fixes regarding semantic predicates and (..)+ subrules. This version also introduced token classes, the “not” operator, and token ranges.

On June 19, 1994, SORCERER 1.00B9 was released. Gary Funck of Intrepid Technology joined the SORCERER team and provided very valuable suggestions regarding the “transform” mode of SORCERER.

On August 8, 1994, PCCTS 1.21 was released. It mainly cleaned up the C++ output and included a number of bug fixes.

From the 1.21 release forward, the maintenance and support of all PCCTS tools was picked up by Parr Research Corporation.

A sophisticated error handling mechanism called “parser exception handling” was released for version 1.30. 1.31 fixed a few bugs.

Release 1.33 is the version corresponding to the initial book release.

ANTLR 2.0.0 came out around May 1997 and was partially funded so Terence hired John Lilley, a maniac coder and serious ANTLR hacker, to build much of the initial version. Terence did the grammar analyzer, naturally.

John Mitchell, Jim Coker, Scott Stanchfield, and Monty Zukowski donate lots of brain power to ANTLR 2.xx in general.

ANTLR 2.1.0, July 1997, mainly improved parsing performance, descreased parser memory requirements, and added a lot of cool lexer features including a case-insensitivity option.

ANTLR 2.2.0, December 1997, saw the introduction of the new http://www.antlr2.org website. This release also added grammar inheritance, enhanced AST support, and enhanced lexical translation support (each lexical rule now was considered to return a Token object even when referenced by another lexical rule).

ANTLR 2.3.0, June 1998, was the first version to have Peter Wells C++ code generator.

ANTLR 2.4.0, September 1998, introduced the ParseView parser debugger by Scott Stanchfield. This version also had a semi-functional -html option to generate HTML from your grammar for reading purposes. Scott and Terence updated the file I/O to be JDK 1.1.

ANTLR 2.5.0, November 1998, introduced the filter option for the lexer that lets ANTLR behave like SED or AWK.

ANTLR 2.6.0, March 1999, introduced token streams. Chapman Flack, Purdue Graduate student, pounded me at the right moment about streams, nudging me in the right direction.

MageLang Institute currently provides support and continues development of ANTLR.

MageLang becomes jGuru.com as we quit doing Java training and start building the jGuru Java developer's website.

2.7.0 released January 19, 2000 had the following enhancements:

Nongreedy subrules

Heterogeneous trees

Element options. To support heterogeneous trees, elements such as token references may now include options.

Exception hierarchy redesign

XML serialization

Improved C++ code generator

New Sather code generator

And had a Sather code generator.

2.7.1 released October 1, 2000 had the following enhancements

ANTLR now allows UNICODE characters because Terence made case-statement expressions more efficient ;) See the unicode example in the distribution and the brief blurb in the documentation.
Massively improved C++ code generator (Thanks to Ric Klaren).
Added automatic column setting support.
Ter added throws to tree and regular parsers .

2.7.2 release January 19, 2003 was mainly a bug fix release but also included a C# code generator by Micheal Jordan, Kunle Odutola and Anthony Oguntimehin. :) I added an antlr.build.Tool 'cause I hate ANT. This release does UNICODE properly now. Added limited lexical lookahead hoisting. Sather code generator disappears. Source changes for Eclipse and NetBeans by Marco van Meegen and Brian Smith.

2.7.3 released March 22, 2004 was mainly a bug fix release, but included the parse-tree/derivation code to aid in debugging plus the cool TokenStreamRewriteEngine that makes rewriting or tweaking input files particularly easy.

2.7.4 released May 9, 2004 was mainly a bug fix release for C++ and C# generators.

2.7.5 released January 28, 2005 was mainly a release for the Python code generator and a provided number of bug fixes. Wolfgang Häfelinger and Marq Kole joined the project to handle the Python!

Terence is working fiendishly on the 3.0 version of ANTLR, a complete rewrite with a powerful new parsing engine called LL(*) that provides the efficiency of fixed lookahead but can throttle up automatically to arbitrary lookahead when needed. It is extremely clean in both the source code and the grammar meta-language. The code generation is extremely flexible (based upon StringTemplate) and makes retargetting trivial. Ric Klaren built a C code generator in 2 days w/o having seeing either new ANTLR or StringTemplate before. :) Expect an early release program in Spring 2005. Jean Bovet, a graduate student in CS here at University of San Francisco is working on a GUI IDE.