javac lexer parser rewrite

Maurizio Cimadamore maurizio.cimadamore at oracle.com
Tue Feb 7 02:27:10 PST 2012


Hi
let me start by saying that I agree with you - the current parser/lexer 
architecture is messy and it represent a barrier for other people to 
chime in and start to contribute. However, when I was working on a 
parser improvement related to lambda expressions (I added lookahead 
support), I was surprised to see how fast javac lexer/parser actually 
are. Here are some 'unofficial' numbers taken on my machine (each run 
correspond to lexing the 'jdk/src' folder of the JDK 8 repo):

Run1: 0m6.501s
Run2: 0m6.205s
Run3: 0m6.936s

AVG: 6.547
TOTAL FILES: 7846
AVG TIME/FILE: 0.83 * 10-6 s

So, is it messy? Sure - is it fast? Yes, like hell. So, to summarise,  I 
think that any effort to try to improve our parser/lexer architecture is 
definitively welcome - however, anyone embarking on such a project 
should keep the above numbers in mind - if you can achieve the same 
speed (well, even marginally slower would be acceptable) than it'd be an 
option well worth considering.

Maurizio

On 07/02/12 09:57, leszekp at safe-mail.net wrote:
> Hello
>
> Javac scanner and parser now are handwritten. The code, especially in parser is quite messy and
> hard to read and modify.
> It is possible to rewrite lexer and parser using some kind of java parser generator.
> It would improve readability and allows for easier modifications.
>
> There is a project 'compiler grammar' (which seems dormant). Java lexer and parser were rewritten
> using antlr. But antrl generated parsers are very slow.
>
> Many lexer and parser generators exists which are able to process 'classic' regular expressions for lexer or
> context free grammars for parser and produce fast code (ie. jflex, beaver, jikes parser generator and more)
>
> What do you think about it? Is there a need for such thing? Is it worth the effort?
>
> Regards
> Leszek Piotrowicz




More information about the compiler-dev mailing list