Question regarding JavaLetter and JavaLetterOrDigit in the JLS
Ken Domino
ken.domino at gmail.com
Sat May 1 12:16:25 UTC 2021
Folks,
I have been updating the Antlr grammars for Java in
https://github.com/antlr/grammars-v4/tree/master/java. In one of the
grammars, semantic predicates are used to evaluate
“Character.isJavaIdentifierStart(int)” and
“Character.isJavaIdentifierPart(int)” for the input, as per Section 3.8
of the Java Language Spec:
“A "Java letter" is a character for which the method
Character.isJavaIdentifierStart(int) returns true.
A "Java letter-or-digit" is a character for which the method
Character.isJavaIdentifierPart(int) returns true.”
However, I can for a fixed version of the Spec evaluate the two
functions for all code points, and then define a rule for Java Letter
and Java Digit that is not an operational definition, e.g.:
fragment JavaLetter : [\p{Lu}] | [\p{Ll}] | [\p{Lt}] | [\p{Lm}] |
[\p{Lo}] | [\p{Nl}] | [\p{Pc}] | [\p{Sc}] ;
fragment JavaLetterOrDigit : [\p{Lu}] | [\p{Ll}] | [\p{Lt}] | [\p{Lm}] |
[\p{Lo}] | [\p{Mn}] | [\p{Mc}] | [\p{Nd}] | [\p{Nl}] | [\p{Cf}] |
[\p{Pc}] | [\p{Sc}] | [\u{00000000}-\u{00000008}] |
[\u{0000000E}-\u{0000001B}] | [\u{0000007F}-\u{0000009F}] ;
Why does the Spec use an operation definition for JavaLetter and
JavaLetterOrDigit rather than to pick a Unicode standard and define the
symbols as set of code points and general categories?
Ken
More information about the jls-jvms-spec-comments
mailing list