javac lexer parser rewrite
Maurizio Cimadamore
maurizio.cimadamore at oracle.com
Tue Feb 7 02:27:10 PST 2012
Hi
let me start by saying that I agree with you - the current parser/lexer
architecture is messy and it represent a barrier for other people to
chime in and start to contribute. However, when I was working on a
parser improvement related to lambda expressions (I added lookahead
support), I was surprised to see how fast javac lexer/parser actually
are. Here are some 'unofficial' numbers taken on my machine (each run
correspond to lexing the 'jdk/src' folder of the JDK 8 repo):
Run1: 0m6.501s
Run2: 0m6.205s
Run3: 0m6.936s
AVG: 6.547
TOTAL FILES: 7846
AVG TIME/FILE: 0.83 * 10-6 s
So, is it messy? Sure - is it fast? Yes, like hell. So, to summarise, I
think that any effort to try to improve our parser/lexer architecture is
definitively welcome - however, anyone embarking on such a project
should keep the above numbers in mind - if you can achieve the same
speed (well, even marginally slower would be acceptable) than it'd be an
option well worth considering.
Maurizio
On 07/02/12 09:57, leszekp at safe-mail.net wrote:
> Hello
>
> Javac scanner and parser now are handwritten. The code, especially in parser is quite messy and
> hard to read and modify.
> It is possible to rewrite lexer and parser using some kind of java parser generator.
> It would improve readability and allows for easier modifications.
>
> There is a project 'compiler grammar' (which seems dormant). Java lexer and parser were rewritten
> using antlr. But antrl generated parsers are very slow.
>
> Many lexer and parser generators exists which are able to process 'classic' regular expressions for lexer or
> context free grammars for parser and produce fast code (ie. jflex, beaver, jikes parser generator and more)
>
> What do you think about it? Is there a need for such thing? Is it worth the effort?
>
> Regards
> Leszek Piotrowicz
More information about the compiler-dev
mailing list