Question regarding JavaLetter and JavaLetterOrDigit in the JLS
Alex Buckley
alex.buckley at oracle.com
Mon Dec 20 20:36:00 UTC 2021
The isJavaIdentifier* methods are already specified in terms of code
points and general categories. Having the JLS defer to those methods
simplifies long-term maintenance of the JLS with no loss of information.
Alex
On 5/1/2021 5:16 AM, Ken Domino wrote:
> Folks,
>
> I have been updating the Antlr grammars for Java in
> https://github.com/antlr/grammars-v4/tree/master/java. In one of the
> grammars, semantic predicates are used to evaluate
> “Character.isJavaIdentifierStart(int)” and
> “Character.isJavaIdentifierPart(int)” for the input, as per Section 3.8
> of the Java Language Spec:
>
> “A "Java letter" is a character for which the method
> Character.isJavaIdentifierStart(int) returns true.
> A "Java letter-or-digit" is a character for which the method
> Character.isJavaIdentifierPart(int) returns true.”
>
> However, I can for a fixed version of the Spec evaluate the two
> functions for all code points, and then define a rule for Java Letter
> and Java Digit that is not an operational definition, e.g.:
>
> fragment JavaLetter : [\p{Lu}] | [\p{Ll}] | [\p{Lt}] | [\p{Lm}] |
> [\p{Lo}] | [\p{Nl}] | [\p{Pc}] | [\p{Sc}] ;
> fragment JavaLetterOrDigit : [\p{Lu}] | [\p{Ll}] | [\p{Lt}] | [\p{Lm}] |
> [\p{Lo}] | [\p{Mn}] | [\p{Mc}] | [\p{Nd}] | [\p{Nl}] | [\p{Cf}] |
> [\p{Pc}] | [\p{Sc}] | [\u{00000000}-\u{00000008}] |
> [\u{0000000E}-\u{0000001B}] | [\u{0000007F}-\u{0000009F}] ;
>
> Why does the Spec use an operation definition for JavaLetter and
> JavaLetterOrDigit rather than to pick a Unicode standard and define the
> symbols as set of code points and general categories?
>
> Ken
More information about the jls-jvms-spec-comments
mailing list