Another parser than ANTLR

Jonathan Gibbons Jonathan.Gibbons at Sun.COM
Tue Oct 14 16:59:48 PDT 2008


Rémi,

Just to be clear, the majority of my earlier comments
about the compiler-grammar project were specifically
a discussion about the ongoing ANTLR project, to give
you, and others, the background and status of that
work.  They were not intended to be more generally
applicable, although it was interesting to hear how
Tatoo would measure up to the same criteria.

-- Jon


Rémi Forax wrote:
> Jonathan Gibbons a écrit :
>> Hi Rémi,
> Hi Jon,
>>
>> Your work on your grammar sounds interesting.
> It's not my work, it's a joint work with a grammar specialist.
>>
>> The compiler-grammar project currently has two goals:
>> 1) to develop a version of javac with a "better" parser that
>> is more maintainable and less fragile than the current
>> parser.
>> 2) to build a unified grammar for JLS that can be demonstrated
>> and verified within javac.
>>
>> For the first goal, there are a number of constraints, not the least
>> of which is performance. Currently, the ANTLR parser is somewhat
>> slower than the standard javac parser.  Yang and others are
>> looking at performance;  for any parser to become standard, it
>> would have to have acceptable performance, whatever that might
>> mean. 
> I don't think that performance is a problem.
> Tatoo parsers was sucessfully integrated into a web server
> without suffering any major performance degradation
> compared to a hand written parser.
>> Another issue is the quality of errors and error recovery.
>> The errors detected need not be 1-1 identical, but (for example)
>> a parser with one message ("syntax error") would fall below the
>> quality bar.
> Tatoo parser is a LR parser (you can choose a LR, SLR or LALR)
> so when an error is detected you can have more than one context.
> So when an error is detected, Tatoo provides those contexts along
> with all terminals that doesn't lead to an error.
> So by default Tatoo is able to print a better message than
> just syntax error :)
>
> Unfortunately error recovery is not that simple and
> I'm afraid that it will need some amount of work.
>
>>
>> The second goal is about having a better grammar in JLS, that
>> is consistent as much as possible between the body of the book
>> and the summary in chapter 18.  "Better" means something along
>> the lines of "clear, concise, and strongly related the grammar
>> in javac".  The thought is to be able to take the ANTLR grammar
>> from javac and post-process it into something similar to JLS
>> format.
> Tatoo input format separates the specification of the grammar
> (an EBNF ) from the semantic part specified by implementing a Java 
> interface.
> Here is an example of a C like grammar directly used by Tatoo
> http://gforgeigm.univ-mlv.fr/scm/viewvc.php/trunk/samples/pseudo/pseudo.ebnf?revision=1798&root=tatoo&view=markup 
>
>
>>
>> Note that these goals need not have identical grammars; the grammar
>> for a production compiler might have more rules for handling errors,
>> handling multiple source versions, and so on.
> Tatoo has special mechanisms to handle grammar versioning
> and terminals that can be identifier or not depending on the context.
>> However, I am keen
>> that anything that claims to be a Java grammar should be demonstrable
>> within a Java compiler.
>>
>> Now for your question about whether the compiler-grammar project is
>> open to more parsers.  This has some flavor of the KSL discussion.
> KSL is SVN based and use a pretty old version of the compiler.
> It's perhaps time to migrate to someting like kenai.com.
>
>> OpenJDK projects have specific deliverables, and are not intended
>> for open-ended work such as "integrate any parsers that come along
>> into javac". If you wanted to create a sibling project on OpenJDK, that
>> would be possible, or we could investigate the possibility of having a
>> separate repository within the context of the compiler-grammar project,
>> I guess.  Note that in the Mercurial world, you create new 
>> repositories the
>> same way you would create a branch within centralized SCMs like
>> Subversion. Also note that if you put work into OpenJDK, contributors
>> have to sign the SCA etc.
> I've already the right to commit changes to another workspace of the
> OpenJDK, so SCA is not a problem.
>>
>> It seems to me there is a better possibility we could work towards.
>> We already took a step in the right direction with the ANTLR work,
>> in that the ANTLR parser does not replace the existing parser --
>> it coexists with it, and is currently selected by a hidden compiler
>> option. To make that happen, we refactored some of the code in
>> JavaCompiler and the parser.* classes so that there was a cleaner
>> interface that is implemented by both the standard parser and the
>> new ANTLR parser. Currently, the option to select the parser is
>> still clunky, but if we could improve that for/with you, we might be
>> able to make it so that you can just drop a jar file in the classpath
>> and have your Tatoo based parser available in javac.
>> [Note the basic refactoring is or should be available in the main
>> OpenJDK repositories, as it is not specific to ANTLR.]
> I'm fine with an -XD like option.
> If you provide more, i will use it :)
>>
>> Finally, Yang has reminded me that the compiler-grammar web
>> pages are still the placeholders that Mark created when he created
>> the project for us. It is "on the list" to update these pages.  Adding
>> links to "Related Projects" such as Tatoo is an obvious improvement.
> I am not against some public advertisement.
>> And, in a similar vein, I think it is reasonable to post information,
>> announcements etc regarding your project here on the compiler-grammar
>> list.
> ok
>>
>> -- Jon
> Cheers,
> Rémi




More information about the compiler-grammar-dev mailing list