javac lexer parser rewrite

leszekp at Safe-mail.net leszekp at Safe-mail.net
Tue Feb 7 03:29:58 PST 2012


Hello
I would gladly experiment with lexing/parsing to see if machine generated code are of comparable speed
I wonder how you measure pure lexing time. Do you have some 'special' wrapper around javac to do it?

Leszek

-------- Original Message --------
From: Maurizio Cimadamore <maurizio.cimadamore at oracle.com>
To: leszekp at safe-mail.net
Cc: compiler-dev at openjdk.java.net
Subject: Re: javac lexer parser rewrite
Date: Tue, 07 Feb 2012 10:27:10 +0000

> Hi
> let me start by saying that I agree with you - the current parser/lexer 
> architecture is messy and it represent a barrier for other people to 
> chime in and start to contribute. However, when I was working on a 
> parser improvement related to lambda expressions (I added lookahead 
> support), I was surprised to see how fast javac lexer/parser actually 
> are. Here are some 'unofficial' numbers taken on my machine (each run 
> correspond to lexing the 'jdk/src' folder of the JDK 8 repo):
> 
> Run1: 0m6.501s
> Run2: 0m6.205s
> Run3: 0m6.936s
> 
> AVG: 6.547
> TOTAL FILES: 7846
> AVG TIME/FILE: 0.83 * 10-6 s
> 
> So, is it messy? Sure - is it fast? Yes, like hell. So, to summarise,  I 
> think that any effort to try to improve our parser/lexer architecture is 
> definitively welcome - however, anyone embarking on such a project 
> should keep the above numbers in mind - if you can achieve the same 
> speed (well, even marginally slower would be acceptable) than it'd be an 
> option well worth considering.
> 
> Maurizio
> 
> On 07/02/12 09:57, leszekp at safe-mail.net wrote:
> > Hello
> >
> > Javac scanner and parser now are handwritten. The code, especially in parser is quite messy and
> > hard to read and modify.
> > It is possible to rewrite lexer and parser using some kind of java parser generator.
> > It would improve readability and allows for easier modifications.
> >
> > There is a project 'compiler grammar' (which seems dormant). Java lexer and parser were rewritten
> > using antlr. But antrl generated parsers are very slow.
> >
> > Many lexer and parser generators exists which are able to process 'classic' regular expressions for lexer or
> > context free grammars for parser and produce fast code (ie. jflex, beaver, jikes parser generator and more)
> >
> > What do you think about it? Is there a need for such thing? Is it worth the effort?
> >
> > Regards
> > Leszek Piotrowicz



More information about the compiler-dev mailing list