Identifier Ignorable characters in keywords and literals
Alex Buckley
alex.buckley at oracle.com
Tue Sep 22 17:41:37 UTC 2020
// Adding Dan explicitly
On 9/21/2020 10:39 PM, Pravin Jain wrote:
> The following code compiles and executes successfully.
>
> public cl\u0001ass Identifier\u0002Ignorable {
> public sta\u0003tic vo\u0004id ma\u0005in(String[] args) {
> System.out.println("Hello world");
> }
> }
>
> The JLS mentions about the use of Identifier-Ignorable characters
> being allowed in an Identifier, but using those in a keyword, or
> literal has not been mentioned. From the specification, one does not
> gather that these characters will be ignored when used inside a
> keyword or a literal.y Is this error of compiler or the JLS has missed
> to clarify this point?
It would be legitimate for JLS 3.3 to acknowledge that some `\uxxxx`
Unicode escapes represent UTF-16 code units which denote "ignorable"
code points; such UTF-16 code units are _not_ included in the sequence
of Unicode input characters resulting from this translation step.
Dan, is it possible to make this small clarification in the JLS ch.3
update for contextual keywords?
The text in 3.8 -- "Two identifiers are the same only if, after ignoring
characters that are ignorable, the identifiers have the same Unicode
character for each letter or digit." -- would be slightly redundant in
calling out ignorable characters, but it should not be changed because
it states a clear, easy-to-understand rule for Java programmers looking
to go beyond ASCII in their identifiers.
Alex
More information about the compiler-dev
mailing list