Question regarding JavaLetter and JavaLetterOrDigit in the JLS

Sat May 1 12:16:25 UTC 2021

Folks,

I have been updating the Antlr grammars for Java in 
https://github.com/antlr/grammars-v4/tree/master/java. In one of the 
grammars, semantic predicates are used to evaluate 
“Character.isJavaIdentifierStart(int)” and 
“Character.isJavaIdentifierPart(int)” for the input, as per Section 3.8 
of the Java Language Spec:

“A "Java letter" is a character for which the method 
Character.isJavaIdentifierStart(int) returns true.
A "Java letter-or-digit" is a character for which the method 
Character.isJavaIdentifierPart(int) returns true.”

However, I can for a fixed version of the Spec evaluate the two 
functions for all code points, and then define a rule for Java Letter 
and Java Digit that is not an operational definition, e.g.:

fragment JavaLetter : [\p{Lu}] | [\p{Ll}] | [\p{Lt}] | [\p{Lm}] | 
[\p{Lo}] | [\p{Nl}] | [\p{Pc}] | [\p{Sc}] ;
fragment JavaLetterOrDigit : [\p{Lu}] | [\p{Ll}] | [\p{Lt}] | [\p{Lm}] | 
[\p{Lo}] | [\p{Mn}] | [\p{Mc}] | [\p{Nd}] | [\p{Nl}] | [\p{Cf}] | 
[\p{Pc}] | [\p{Sc}] | [\u{00000000}-\u{00000008}] | 
[\u{0000000E}-\u{0000001B}] | [\u{0000007F}-\u{0000009F}] ;

Why does the Spec use an operation definition for JavaLetter and 
JavaLetterOrDigit rather than to pick a Unicode standard and define the 
symbols as set of code points and general categories?

Ken