Question regarding JavaLetter and JavaLetterOrDigit in the JLS

Mon Dec 20 20:36:00 UTC 2021

The isJavaIdentifier* methods are already specified in terms of code 
points and general categories. Having the JLS defer to those methods 
simplifies long-term maintenance of the JLS with no loss of information.

Alex

On 5/1/2021 5:16 AM, Ken Domino wrote:
> Folks,
> 
> I have been updating the Antlr grammars for Java in 
> https://github.com/antlr/grammars-v4/tree/master/java. In one of the 
> grammars, semantic predicates are used to evaluate 
> “Character.isJavaIdentifierStart(int)” and 
> “Character.isJavaIdentifierPart(int)” for the input, as per Section 3.8 
> of the Java Language Spec:
> 
> “A "Java letter" is a character for which the method 
> Character.isJavaIdentifierStart(int) returns true.
> A "Java letter-or-digit" is a character for which the method 
> Character.isJavaIdentifierPart(int) returns true.”
> 
> However, I can for a fixed version of the Spec evaluate the two 
> functions for all code points, and then define a rule for Java Letter 
> and Java Digit that is not an operational definition, e.g.:
> 
> fragment JavaLetter : [\p{Lu}] | [\p{Ll}] | [\p{Lt}] | [\p{Lm}] | 
> [\p{Lo}] | [\p{Nl}] | [\p{Pc}] | [\p{Sc}] ;
> fragment JavaLetterOrDigit : [\p{Lu}] | [\p{Ll}] | [\p{Lt}] | [\p{Lm}] | 
> [\p{Lo}] | [\p{Mn}] | [\p{Mc}] | [\p{Nd}] | [\p{Nl}] | [\p{Cf}] | 
> [\p{Pc}] | [\p{Sc}] | [\u{00000000}-\u{00000008}] | 
> [\u{0000000E}-\u{0000001B}] | [\u{0000007F}-\u{0000009F}] ;
> 
> Why does the Spec use an operation definition for JavaLetter and 
> JavaLetterOrDigit rather than to pick a Unicode standard and define the 
> symbols as set of code points and general categories?
> 
> Ken