javac lexer parser rewrite

Brian Goetz brian.goetz at oracle.com
Sun Feb 19 20:27:16 PST 2012


There was a project that was undertaken a while ago to create an ANTLR 
'reference' grammar for Java.  This project had some success (it passed 
a tree-for-tree comparison with javac on a reasonable corpus) but has 
since decayed.  Performance-wise, I think it was about 10x slower than 
the existing javac parser.

I think it would be fantastic for someone to pick up the work that was 
done, and bring it up to date.  A ready-to-use ANTLR grammar would be 
great for anyone building tools that need to ingest java source code, 
even if javac didn't use it.  Because so many people know ANTLR, an 
ANTLR grammar would probably be more generally useful than one generated 
with less popular tools, since that would enable more people to be able 
to customize it for their own purposes.

Are you interested in working on this?  I'm sure we could dig up the 
existing work and test suites as a starting point.

On 2/7/2012 4:57 AM, leszekp at safe-mail.net wrote:
> Hello
>
> Javac scanner and parser now are handwritten. The code, especially in parser is quite messy and
> hard to read and modify.
> It is possible to rewrite lexer and parser using some kind of java parser generator.
> It would improve readability and allows for easier modifications.
>
> There is a project 'compiler grammar' (which seems dormant). Java lexer and parser were rewritten
> using antlr. But antrl generated parsers are very slow.
>
> Many lexer and parser generators exists which are able to process 'classic' regular expressions for lexer or
> context free grammars for parser and produce fast code (ie. jflex, beaver, jikes parser generator and more)
>
> What do you think about it? Is there a need for such thing? Is it worth the effort?
>
> Regards
> Leszek Piotrowicz



More information about the compiler-dev mailing list