javac lexer parser rewrite

Wed Feb 8 08:55:33 PST 2012

On 02/08/2012 04:52 PM, leszekp at safe-mail.net wrote:
> Both hand-coded parser and generated one has some advantages and disadvantages.
> Of course it is good to have plain java code and have the possiblitity to debug it.
>
> But as level of complication rises, at some point hand-written parser becomes unamanageable anyway.

Do you have taken a look to the source code. ?

The parser is readable mostly because most the methods correspond to
a LR state and that the dotted productions of that state
are available in the documentation of the methods.

> In example Pascal language was designed to be LL(1) and hand-written recursive descent
> parser for this language is probably quite understandable. But java wasn't designed that way
> and its hand-written parser has a lot of quirks which made it complicated to understand.

Sorry, Java 1.0 was designed to be LALR(1), there is some discussions in 
the JLS 1 about that.

The main issue, is more that the grammar of the spec was not updated 
correctly after that.
This had some painful drawbacks like by example the fact that some parts 
of the grammar
(the one allocating a generics inner class by example) was missing from 
the parser
when the jdk 1.5 was released.

Also while I know that the grammar of Java 5 is LALR, I've no idea in 
the one of
the upcoming Java 8 is still LALR.

>
> I am experimenting with jflex generated java lexer. It is very fast - comparable
> to original javac Scanner, it is promising.

The problem is more the parser than the lexer.

A good project should be to write a parser generator that takes the Java 
grammar
and generates the same code as the existing parser, by the way.

>
> regards
> Leszek

cheers,
Rémi