Another parser than ANTLR

Tue Oct 14 08:03:50 PDT 2008

Hi Rémi,

Your work on your grammar sounds interesting.

The compiler-grammar project currently has two goals:
1) to develop a version of javac with a "better" parser that
is more maintainable and less fragile than the current
parser.
2) to build a unified grammar for JLS that can be demonstrated
and verified within javac.

For the first goal, there are a number of constraints, not the least
of which is performance. Currently, the ANTLR parser is somewhat
slower than the standard javac parser.  Yang and others are
looking at performance;  for any parser to become standard, it
would have to have acceptable performance, whatever that might
mean. Another issue is the quality of errors and error recovery.
The errors detected need not be 1-1 identical, but (for example)
a parser with one message ("syntax error") would fall below the
quality bar.

The second goal is about having a better grammar in JLS, that
is consistent as much as possible between the body of the book
and the summary in chapter 18.  "Better" means something along
the lines of "clear, concise, and strongly related the grammar
in javac".  The thought is to be able to take the ANTLR grammar
from javac and post-process it into something similar to JLS
format.

Note that these goals need not have identical grammars; the grammar
for a production compiler might have more rules for handling errors,
handling multiple source versions, and so on.  However, I am keen
that anything that claims to be a Java grammar should be demonstrable
within a Java compiler.

Now for your question about whether the compiler-grammar project is
open to more parsers.  This has some flavor of the KSL discussion.
OpenJDK projects have specific deliverables, and are not intended
for open-ended work such as "integrate any parsers that come along
into javac". If you wanted to create a sibling project on OpenJDK, that
would be possible, or we could investigate the possibility of having a
separate repository within the context of the compiler-grammar project,
I guess.  Note that in the Mercurial world, you create new  
repositories the
same way you would create a branch within centralized SCMs like
Subversion. Also note that if you put work into OpenJDK, contributors
have to sign the SCA etc.

It seems to me there is a better possibility we could work towards.
We already took a step in the right direction with the ANTLR work,
in that the ANTLR parser does not replace the existing parser --
it coexists with it, and is currently selected by a hidden compiler
option. To make that happen, we refactored some of the code in
JavaCompiler and the parser.* classes so that there was a cleaner
interface that is implemented by both the standard parser and the
new ANTLR parser. Currently, the option to select the parser is
still clunky, but if we could improve that for/with you, we might be
able to make it so that you can just drop a jar file in the classpath
and have your Tatoo based parser available in javac.
[Note the basic refactoring is or should be available in the main
OpenJDK repositories, as it is not specific to ANTLR.]

Finally, Yang has reminded me that the compiler-grammar web
pages are still the placeholders that Mark created when he created
the project for us. It is "on the list" to update these pages.  Adding
links to "Related Projects" such as Tatoo is an obvious improvement.
And, in a similar vein, I think it is reasonable to post information,
announcements etc regarding your project here on the compiler-grammar
list.

-- Jon

On Oct 12, 2008, at 12:47 PM, Rémi Forax wrote:

> Hi, jon, hi yang, hi all,
> Since 4 years, i have developed with two colleagues a LR parser  
> generator,
> It has several good features among:
>
>   * separate specifications for lexer (regular expression based
>     rules), parser (grammar) and semantics (java interface
>     implementation);
>   * push analyzer: the characters are fed to the analyzer so it allows
>     asynchronous usage (for instance with thread pool and selectors);
>   * automated lexer rule selection according to tokens expected by the
>     parser (this allows to let user name their variable as some  
> keywords);
>   * production of shared parser tables for different versions of the
>     language to simplify backward compatibility or allow version
>     change during parsing;
>
> I think it could be interresting to try to use it.
> So here is my question, is the compiler-grammar workspace open to  
> other contribution than an ANTLR parser ?
>
> regards,
> Remi
>