javac lexer parser rewrite

Tue Feb 7 05:33:29 PST 2012

On 07/02/12 11:29, leszekp at safe-mail.net wrote:
> Hello
> I would gladly experiment with lexing/parsing to see if machine generated code are of comparable speed
> I wonder how you measure pure lexing time. Do you have some 'special' wrapper around javac to do it?

There are various way to do that; one way is to use a custom 
ScannerFactory that returns your own scanner and register it onto the 
javac context. By doing so you will be able to plug in a new Lexer 
inside javac (see [1] for an example). Alternatively, a more simpler way 
to test pure lexer time you could create your own lexer, and call 
'nextToken' until the end of the file is reached. But, if you go for the 
micro-benchmark approach, remember to do something with the result of 
nextToken() [i.e. compute the hash of the token and add it to a shared 
counter] - if the return value of nextToken is unused, hotspot 
optimizations will kick in and there's a chance that the loop will be 
optimized away ;-).

[1] - 
http://hg.openjdk.java.net/jdk8/tl/langtools/file/tip/test/tools/javac/api/TestJavacTaskScanner.java

Maurizio
>
> Leszek
>
> -------- Original Message --------
> From: Maurizio Cimadamore<maurizio.cimadamore at oracle.com>
> To: leszekp at safe-mail.net
> Cc: compiler-dev at openjdk.java.net
> Subject: Re: javac lexer parser rewrite
> Date: Tue, 07 Feb 2012 10:27:10 +0000
>
>> Hi
>> let me start by saying that I agree with you - the current parser/lexer
>> architecture is messy and it represent a barrier for other people to
>> chime in and start to contribute. However, when I was working on a
>> parser improvement related to lambda expressions (I added lookahead
>> support), I was surprised to see how fast javac lexer/parser actually
>> are. Here are some 'unofficial' numbers taken on my machine (each run
>> correspond to lexing the 'jdk/src' folder of the JDK 8 repo):
>>
>> Run1: 0m6.501s
>> Run2: 0m6.205s
>> Run3: 0m6.936s
>>
>> AVG: 6.547
>> TOTAL FILES: 7846
>> AVG TIME/FILE: 0.83 * 10-6 s
>>
>> So, is it messy? Sure - is it fast? Yes, like hell. So, to summarise,  I
>> think that any effort to try to improve our parser/lexer architecture is
>> definitively welcome - however, anyone embarking on such a project
>> should keep the above numbers in mind - if you can achieve the same
>> speed (well, even marginally slower would be acceptable) than it'd be an
>> option well worth considering.
>>
>> Maurizio
>>
>> On 07/02/12 09:57, leszekp at safe-mail.net wrote:
>>> Hello
>>>
>>> Javac scanner and parser now are handwritten. The code, especially in parser is quite messy and
>>> hard to read and modify.
>>> It is possible to rewrite lexer and parser using some kind of java parser generator.
>>> It would improve readability and allows for easier modifications.
>>>
>>> There is a project 'compiler grammar' (which seems dormant). Java lexer and parser were rewritten
>>> using antlr. But antrl generated parsers are very slow.
>>>
>>> Many lexer and parser generators exists which are able to process 'classic' regular expressions for lexer or
>>> context free grammars for parser and produce fast code (ie. jflex, beaver, jikes parser generator and more)
>>>
>>> What do you think about it? Is there a need for such thing? Is it worth the effort?
>>>
>>> Regards
>>> Leszek Piotrowicz

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/compiler-dev/attachments/20120207/e273d55c/attachment.html