My observations at Source-to-Source 2004 workshop
This invitation-only workshop was held Oct 24, 2004 in Vancouver as
part of GPCE / OOPSLA. Around 20 people were there and we all did 10
minute position presentations. Here are the presentations.
I read the 2-page position papers on the plane up and decided to throw
out my slides. I was just going to talk about StringTemplate, but
decided instead to point out how differently I viewed the world. I am
definitely the "lunatic fringe" in this group. Except for a nice
presentation from a student at Berkeley (working with Susan Graham),
the goal of everybody's research seemed to be to create the system
that can do the most amazing transformations with the smallest spec
ignoring all other concerns. That is to say, almost no one was
building a system targeted at the average programmer. Actually, I'm
updating this paragraph after having lunch with some of the graduate
students working on these systems. They indicate that a major goal
is to provide reusable transformations. Anyway, this was definitely a day of
inbreeding. ;) Researchers are all trying to impress each other.
Naturally, I didn't make any friends by pointing this out.
My main impression from this workshop is that researchers in this area
absolutely do not worry about how easy their systems are to
understand. I pointed out that I worry not about the way things ought
to be (many claim we should all be doing functional languages and
using these complicated systems); rather, I worry about how things
are, hence, I am constrained to build systems that are easy to use.
My strategy is to automate and formalize what I see programmers (and
myself) doing. My tools are a direct result of building something
else.
Rewriting Systems
These translation systems are all about providing a series of
pattern/translation pairs, predicated on a "database" of "facts" (you
can think of the database as a symbol table). I'm unconvinced that a
large set of rules (like a large XSLT program) is very
understandable--you have to imagine the emergent behavior of the
thing. This is equally true for translators written in ANTLR with
tree parsers etc..., but at least you can step through with a
debugger. There is no mystery "black box" to wonder about. Some
(such as Stratego) split up the rules into a series of modules that
can be "composed" to perform complicated transitions. Might work.
The person that invented "reference attribute grammars" has a java 1.4
compiler completely done with such rewrites; she said I could take a
look. :)
These systems all use parse trees because they work on patterns given
in the source and target language. [I should look at ACE by Gosling
again; did syntax vs text based macros for C]. Because of this and
because the tree is never exposed to the programmer really, parse
trees vs ASTs are probably fine. Naturally, if you have to build a
tree grammar or walker yourself, ASTs are better--no "noise" from rule
nodes in your tree and it's insensitive to grammar flunctuations.
These systems are beasts; measured in tens of megabytes. Definitely
designed to take an input file and dump an output file. Further, they
assume that a programmer is content with using their data structures
and has none of his/her own really. They view the world backwards--that
their system is the center whereas programmers view the world as
looking for little pieces to stick into their app. Integrating a
program written in these systems into a programmer's application is
just not going to happen.
I note that they all have very powerful parsing strategies; the
problem of course is that they are still too slow for most
programmers' taste. Further, and more importantly, programmers don't
understand those mechanism. On the one hand, it's easy to just write
out a natural grammar; though, some of these systems allow ambiguous
grammars which is anathema to translation since you have to know what
you have for computer languages. You only want one meaning.
Miscellaneous
I learned a lot at dinner about some features for ANTLR 3.0. For
example, somebody mentioned that Uwe Kastens came up with an "including" feature very much
like our "dynamic scoping" for attributes. I'll investigate.
For the 3.0 code generator, I have a problem to solve still: how can I
have a Java.stg template file and then allow options such as
"debugging", "build AST", "build parse tree", etc...? Inheritance
isn't quite right...I need something like composition that will pipe
output of one template rule to the same template rule down the pipe so
I can wrap, for example, matchToken in one that does tree
construction. A paper at oopsla on wed apparently will be on
something similar called "nested inheritance" by Nystrom, Ch(a|o)ng,
and myers. I'll poke around.