Identifier Ignorable characters in keywords and literals

Alex Buckley alex.buckley at oracle.com
Tue Sep 22 17:41:37 UTC 2020


// Adding Dan explicitly

On 9/21/2020 10:39 PM, Pravin Jain wrote:
> The following code compiles and executes successfully.
> 
> public cl\u0001ass Identifier\u0002Ignorable {
>      public sta\u0003tic vo\u0004id ma\u0005in(String[] args) {
>          System.out.println("Hello world");
>      }
> }
> 
> The JLS mentions about the use of Identifier-Ignorable characters
> being allowed in an Identifier, but using those in a keyword, or
> literal has not been mentioned. From the specification, one does not
> gather that these characters will be ignored when used inside a
> keyword or a literal.y Is this error of compiler or the JLS has missed
> to clarify this point?

It would be legitimate for JLS 3.3 to acknowledge that some `\uxxxx` 
Unicode escapes represent UTF-16 code units which denote "ignorable" 
code points; such UTF-16 code units are _not_ included in the sequence 
of Unicode input characters resulting from this translation step.

Dan, is it possible to make this small clarification in the JLS ch.3 
update for contextual keywords?

The text in 3.8 -- "Two identifiers are the same only if, after ignoring 
characters that are ignorable, the identifiers have the same Unicode 
character for each letter or digit." -- would be slightly redundant in 
calling out ignorable characters, but it should not be changed because 
it states a clear, easy-to-understand rule for Java programmers looking 
to go beyond ASCII in their identifiers.

Alex


More information about the compiler-dev mailing list