javac lexer parser rewrite
leszekp at Safe-mail.net
leszekp at Safe-mail.net
Mon Feb 20 06:33:51 PST 2012
Brian
I agree with you than java grammar would be beneficiary for
tool builders, compiler hackers etc, even if the official javac does not use it.
I would rather write grammar for LALR parser generator than LL one like antlr
We have at lest 3 suitable tools: Beaver, Byaccj and Java-cup
Advantages of LALR grammar
1 The grammar would be more similar to 'official' java grammar
as it would not need left factoring required by LL grammars
Left lefactoring is the main reason which makes LL grammar hard to read.
2 Genarated parser would be faster than Antlr parser, maybe comparable
to existing hand-written parser (for beaver or byaccj). Java-cup wasn't
written with performance as top priority.
Disadvantages
Antlr is more popular - more people use it. However LALR grammar consists
of grammar rules + actions, the tools mentioned above are not that hard to use.
In my opinion it would be better to try to write lalr grammar than
bringing antlr grammar up to date. I'm interested in it - some time ago I experimented
with java 5 lalr grammar.
Regards
Leszek
-------- Original Message --------
From: Brian Goetz <brian.goetz at oracle.com>
To: leszekp at safe-mail.net
Cc: compiler-dev at openjdk.java.net
Subject: Re: javac lexer parser rewrite
Date: Sun, 19 Feb 2012 23:27:16 -0500
> There was a project that was undertaken a while ago to create an ANTLR
> 'reference' grammar for Java. This project had some success (it passed
> a tree-for-tree comparison with javac on a reasonable corpus) but has
> since decayed. Performance-wise, I think it was about 10x slower than
> the existing javac parser.
>
> I think it would be fantastic for someone to pick up the work that was
> done, and bring it up to date. A ready-to-use ANTLR grammar would be
> great for anyone building tools that need to ingest java source code,
> even if javac didn't use it. Because so many people know ANTLR, an
> ANTLR grammar would probably be more generally useful than one generated
> with less popular tools, since that would enable more people to be able
> to customize it for their own purposes.
>
> Are you interested in working on this? I'm sure we could dig up the
> existing work and test suites as a starting point.
>
> On 2/7/2012 4:57 AM, leszekp at safe-mail.net wrote:
> > Hello
> >
> > Javac scanner and parser now are handwritten. The code, especially in parser is quite messy and
> > hard to read and modify.
> > It is possible to rewrite lexer and parser using some kind of java parser generator.
> > It would improve readability and allows for easier modifications.
> >
> > There is a project 'compiler grammar' (which seems dormant). Java lexer and parser were rewritten
> > using antlr. But antrl generated parsers are very slow.
> >
> > Many lexer and parser generators exists which are able to process 'classic' regular expressions for lexer or
> > context free grammars for parser and produce fast code (ie. jflex, beaver, jikes parser generator and more)
> >
> > What do you think about it? Is there a need for such thing? Is it worth the effort?
> >
> > Regards
> > Leszek Piotrowicz
More information about the compiler-dev
mailing list