unclear grammar java 20

Doorn, Ed van ed.vandoorn at ou.nl
Wed Jun 14 09:08:18 UTC 2023


Part of my research is detecting design patterns in Java sources.
Therefore it is necessary to parse Java sources

I used the grammar specified in https://docs.oracle.com/javase/specs/jls/se20/html/index.html.
First, I made the grammar useful for the bottom-up parser generator cup (http://www2.cs.tum.edu/projects/cup/)
This resulted in more than 100 shifts/reduce and reduce/reduce conflicts.
Reduce/reduce conflicts and suggest grammar mistakes.

Second, I made the grammar useful for the top-down parser generator Grammatica (https://grammatica.percederberg.net).
This resulted in many errors because the LL () grammar is not valid.

An example:
The production rule for ClassOrInterfaceType and ClassType is:

ClassOrInterfaceType:
ClassType<https://docs.oracle.com/javase/specs/jls/se20/html/jls-4.html#jls-ClassType>
InterfaceType<https://docs.oracle.com/javase/specs/jls/se20/html/jls-4.html#jls-InterfaceType>


ClassType:
{Annotation<https://docs.oracle.com/javase/specs/jls/se20/html/jls-9.html#jls-Annotation>} TypeIdentifier<https://docs.oracle.com/javase/specs/jls/se20/html/jls-3.html#jls-TypeIdentifier> [TypeArguments<https://docs.oracle.com/javase/specs/jls/se20/html/jls-4.html#jls-TypeArguments>]
PackageName<https://docs.oracle.com/javase/specs/jls/se20/html/jls-6.html#jls-PackageName> . {Annotation<https://docs.oracle.com/javase/specs/jls/se20/html/jls-9.html#jls-Annotation>} TypeIdentifier<https://docs.oracle.com/javase/specs/jls/se20/html/jls-3.html#jls-TypeIdentifier> [TypeArguments<https://docs.oracle.com/javase/specs/jls/se20/html/jls-4.html#jls-TypeArguments>]
ClassOrInterfaceType<https://docs.oracle.com/javase/specs/jls/se20/html/jls-4.html#jls-ClassOrInterfaceType> . {Annotation<https://docs.oracle.com/javase/specs/jls/se20/html/jls-9.html#jls-Annotation>} TypeIdentifier<https://docs.oracle.com/javase/specs/jls/se20/html/jls-3.html#jls-TypeIdentifier> [TypeArguments<https://docs.oracle.com/javase/specs/jls/se20/html/jls-4.html#jls-TypeArguments>]

This results in an infinite loop: ClassOrInterfaceType produces a ClassType and  ClassType produces
ClassOrInterfaceType .....

Some production rules have more than one alternative which starts with @
which is not valid LL and can not be solved by looking ahead with more tokens.

Is a valid bottom-up grammar e.g. LALR(1) of top-down grammar LL(k) available?

ANTLR provides a LL() grammar for Java 9, that is of course out of date.

With regards,

Ed van Doorn
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/jls-jvms-spec-comments/attachments/20230614/7deb8c96/attachment-0001.htm>


More information about the jls-jvms-spec-comments mailing list