javac lexer parser rewrite
Rémi Forax
forax at univ-mlv.fr
Wed Feb 8 08:55:33 PST 2012
On 02/08/2012 04:52 PM, leszekp at safe-mail.net wrote:
> Both hand-coded parser and generated one has some advantages and disadvantages.
> Of course it is good to have plain java code and have the possiblitity to debug it.
>
> But as level of complication rises, at some point hand-written parser becomes unamanageable anyway.
Do you have taken a look to the source code. ?
The parser is readable mostly because most the methods correspond to
a LR state and that the dotted productions of that state
are available in the documentation of the methods.
> In example Pascal language was designed to be LL(1) and hand-written recursive descent
> parser for this language is probably quite understandable. But java wasn't designed that way
> and its hand-written parser has a lot of quirks which made it complicated to understand.
Sorry, Java 1.0 was designed to be LALR(1), there is some discussions in
the JLS 1 about that.
The main issue, is more that the grammar of the spec was not updated
correctly after that.
This had some painful drawbacks like by example the fact that some parts
of the grammar
(the one allocating a generics inner class by example) was missing from
the parser
when the jdk 1.5 was released.
Also while I know that the grammar of Java 5 is LALR, I've no idea in
the one of
the upcoming Java 8 is still LALR.
>
> I am experimenting with jflex generated java lexer. It is very fast - comparable
> to original javac Scanner, it is promising.
The problem is more the parser than the lexer.
A good project should be to write a parser generator that takes the Java
grammar
and generates the same code as the existing parser, by the way.
>
> regards
> Leszek
cheers,
Rémi
More information about the compiler-dev
mailing list