unclear grammar java 20
Gavin Bierman
gavin.bierman at oracle.com
Mon Nov 25 17:35:39 UTC 2024
Hi Ed,
Thanks for your email. Whilst I can imagine that such a thing would be useful, I’m afraid that we make no claims that the grammar in the JLS is suitable for plugging directly into a parser generator or any other tool. Indeed there are places in the JLS where the grammar is used to capture semantic rather than syntactic issues, and places where we impose semantic rather than syntactic restrictions.
Thanks,
Gavin
On 14 Jun 2023, at 10:08, Doorn, Ed van <ed.vandoorn at ou.nl> wrote:
Part of my research is detecting design patterns in Java sources.
Therefore it is necessary to parse Java sources
I used the grammar specified in https://docs.oracle.com/javase/specs/jls/se20/html/index.html.
First, I made the grammar useful for the bottom-up parser generator cup (http://www2.cs.tum.edu/projects/cup/)
This resulted in more than 100 shifts/reduce and reduce/reduce conflicts.
Reduce/reduce conflicts and suggest grammar mistakes.
Second, I made the grammar useful for the top-down parser generator Grammatica (https://grammatica.percederberg.net<https://grammatica.percederberg.net/>).
This resulted in many errors because the LL () grammar is not valid.
An example:
The production rule for ClassOrInterfaceType and ClassType is:
ClassOrInterfaceType:
ClassType<https://docs.oracle.com/javase/specs/jls/se20/html/jls-4.html#jls-ClassType>
InterfaceType<https://docs.oracle.com/javase/specs/jls/se20/html/jls-4.html#jls-InterfaceType>
ClassType:
{Annotation<https://docs.oracle.com/javase/specs/jls/se20/html/jls-9.html#jls-Annotation>} TypeIdentifier<https://docs.oracle.com/javase/specs/jls/se20/html/jls-3.html#jls-TypeIdentifier> [TypeArguments<https://docs.oracle.com/javase/specs/jls/se20/html/jls-4.html#jls-TypeArguments>]
PackageName<https://docs.oracle.com/javase/specs/jls/se20/html/jls-6.html#jls-PackageName> . {Annotation<https://docs.oracle.com/javase/specs/jls/se20/html/jls-9.html#jls-Annotation>} TypeIdentifier<https://docs.oracle.com/javase/specs/jls/se20/html/jls-3.html#jls-TypeIdentifier> [TypeArguments<https://docs.oracle.com/javase/specs/jls/se20/html/jls-4.html#jls-TypeArguments>]
ClassOrInterfaceType<https://docs.oracle.com/javase/specs/jls/se20/html/jls-4.html#jls-ClassOrInterfaceType> . {Annotation<https://docs.oracle.com/javase/specs/jls/se20/html/jls-9.html#jls-Annotation>} TypeIdentifier<https://docs.oracle.com/javase/specs/jls/se20/html/jls-3.html#jls-TypeIdentifier> [TypeArguments<https://docs.oracle.com/javase/specs/jls/se20/html/jls-4.html#jls-TypeArguments>]
This results in an infinite loop: ClassOrInterfaceType produces a ClassType and ClassType produces
ClassOrInterfaceType .....
Some production rules have more than one alternative which starts with @
which is not valid LL and can not be solved by looking ahead with more tokens.
Is a valid bottom-up grammar e.g. LALR(1) of top-down grammar LL(k) available?
ANTLR provides a LL() grammar for Java 9, that is of course out of date.
With regards,
Ed van Doorn
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/jls-jvms-spec-comments/attachments/20241125/974f04a7/attachment-0001.htm>
More information about the jls-jvms-spec-comments
mailing list