Clarification regarding identifier ignorable characters for keywords
Pravin
pravin at zensoftech.co.in
Thu Dec 26 17:29:48 UTC 2024
Hello sir/madam,
In Section 3.9 Keywords
it states that "51 character sequences, formed from ASCII characters, are reserved for use as keyword and cannot be used as
identifiers. Another 17 character sequences, also formed from ASCII characters, may be interpreted as keywords or as other
tokens, depending on the context in which they appear."
This fails to mention that these character sequences are formed after ignoring the ignorable characters.
e.g.
public is equivalent to pu\u00adblic (\u00ad is the soft-hypen and would be rendered as public (looks the same)
i.e. an ignorable character for identifiers as mentioned in section 3.8 for identifiers with the help of the statement
"Two identifiers are the same only if, after ignoring characters that are
ignorable, the identifiers have the same Unicode character for each letter
or digit. An ignorable character is a character for which the method
Character.isIdentifierIgnorable(int) returns true."
This is true for all keywords also.
Basically all identifier ignorable characters are valid identifier part but they are not valid identifier start.
IntStream.range(0,0x10ffff).filter(Character::isIdentifierIgnorable).allMatch(Character::isJavaIdentifierPart)
returns true
IntStream.range(0,0x10ffff).filter(Character::isIdentifierIgnorable).anyMatch(Character::isJavaIdentifierStart)
returns false
This allows someone to embed these characters without changing the equivalence for identifiers.
Interestingly the same is also true for keywords.
There is one exception in the contextual keyword (non-sealed) which is made up of three tokens, in this case the
identifier ignorable character can be embedded except at the beginning of the third token sealed
i.e. non-\u00adsealed is invalid keyword
but non\u00ad-sealed is valid keyword.
Request to make a clarification about the keywords being equivalent to the ASCII sequence provided after ignoring the ignorable characters.
Thanks and regards,
Pravin
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/compiler-dev/attachments/20241226/2d7eb165/attachment.htm>
More information about the compiler-dev
mailing list