From alex.buckley at oracle.com Tue Sep 22 17:41:37 2020 From: alex.buckley at oracle.com (Alex Buckley) Date: Tue, 22 Sep 2020 10:41:37 -0700 Subject: Identifier Ignorable characters in keywords and literals In-Reply-To: References: Message-ID: <087b829b-7568-6845-7d5c-64bf4d5fd064@oracle.com> // Adding Dan explicitly On 9/21/2020 10:39 PM, Pravin Jain wrote: > The following code compiles and executes successfully. > > public cl\u0001ass Identifier\u0002Ignorable { > public sta\u0003tic vo\u0004id ma\u0005in(String[] args) { > System.out.println("Hello world"); > } > } > > The JLS mentions about the use of Identifier-Ignorable characters > being allowed in an Identifier, but using those in a keyword, or > literal has not been mentioned. From the specification, one does not > gather that these characters will be ignored when used inside a > keyword or a literal.y Is this error of compiler or the JLS has missed > to clarify this point? It would be legitimate for JLS 3.3 to acknowledge that some `\uxxxx` Unicode escapes represent UTF-16 code units which denote "ignorable" code points; such UTF-16 code units are _not_ included in the sequence of Unicode input characters resulting from this translation step. Dan, is it possible to make this small clarification in the JLS ch.3 update for contextual keywords? The text in 3.8 -- "Two identifiers are the same only if, after ignoring characters that are ignorable, the identifiers have the same Unicode character for each letter or digit." -- would be slightly redundant in calling out ignorable characters, but it should not be changed because it states a clear, easy-to-understand rule for Java programmers looking to go beyond ASCII in their identifiers. Alex From alex.buckley at oracle.com Wed Sep 23 00:10:11 2020 From: alex.buckley at oracle.com (Alex Buckley) Date: Tue, 22 Sep 2020 17:10:11 -0700 Subject: Identifier Ignorable characters in keywords and literals In-Reply-To: References: <087b829b-7568-6845-7d5c-64bf4d5fd064@oracle.com> Message-ID: An ignorable Unicode escape such as `\u0001` is a legitimate character in a character literal, string literal, or text block, so javac accepts and translates it there. In contrast, it seems that javac accepts _and discards_ an ignorable Unicode escape: 1. in the body of a comment; 2. as a Java-letter-or-digit in an identifier (i.e., not as the first character of an identifier, but as any subsequent character); 3. in a position to the right of a non-ignorable character within a keyword (thus allowing for appearance at the end of a keyword, and for consecutive ignorable escapes: `class\u0001\u0001`); 4. in a position to the right of a non-ignorable character within a boolean literal or null literal. 1 and 2 are to spec. 3 and 4 are new to the spec. There seems to be a connection between 2 and 3+4: javac is expecting keywords to follow the same Java-letter-followed-by-Java-letters-or-digits format as identifiers. Alex On 9/22/2020 4:07 PM, Pravin Jain wrote: > Thanks for the clarifications. > But let me point out that the Identifier Ignorable characters are > ignored not only in keywords but also in the three literals "true", > "false" and "null" > > Thanks and Regards, > Pravin > > On Tue, Sep 22, 2020 at 11:11 PM Alex Buckley wrote: >> >> // Adding Dan explicitly >> >> On 9/21/2020 10:39 PM, Pravin Jain wrote: >>> The following code compiles and executes successfully. >>> >>> public cl\u0001ass Identifier\u0002Ignorable { >>> public sta\u0003tic vo\u0004id ma\u0005in(String[] args) { >>> System.out.println("Hello world"); >>> } >>> } >>> >>> The JLS mentions about the use of Identifier-Ignorable characters >>> being allowed in an Identifier, but using those in a keyword, or >>> literal has not been mentioned. From the specification, one does not >>> gather that these characters will be ignored when used inside a >>> keyword or a literal.y Is this error of compiler or the JLS has missed >>> to clarify this point? >> >> It would be legitimate for JLS 3.3 to acknowledge that some `\uxxxx` >> Unicode escapes represent UTF-16 code units which denote "ignorable" >> code points; such UTF-16 code units are _not_ included in the sequence >> of Unicode input characters resulting from this translation step. >> >> Dan, is it possible to make this small clarification in the JLS ch.3 >> update for contextual keywords? >> >> The text in 3.8 -- "Two identifiers are the same only if, after ignoring >> characters that are ignorable, the identifiers have the same Unicode >> character for each letter or digit." -- would be slightly redundant in >> calling out ignorable characters, but it should not be changed because >> it states a clear, easy-to-understand rule for Java programmers looking >> to go beyond ASCII in their identifiers. >> >> Alex > > > From alex.buckley at oracle.com Wed Sep 30 16:50:22 2020 From: alex.buckley at oracle.com (Alex Buckley) Date: Wed, 30 Sep 2020 09:50:22 -0700 Subject: related to compile error in preview feature 'record' In-Reply-To: References: Message-ID: <5ac0af92-a3f3-7a3a-e6aa-0c29dbb025cb@oracle.com> In the compact constructor below, `low` refers to an implicit parameter of the constructor, whereas `this.low` refers to a field corresponding to the record component `low`. IIRC, the field is not definitely assigned before the end of the constructor body, so referring to `this.low` anywhere in the constructor body is an error by the traditional rule at the beginning of JLS ch.16. There is no interprocedural analysis in ch.16, so m1's independent reference to a field via `this.low` is allowed on the basis that the field was definitely assigned by the end of every constructor body.' In an ordinary class, a constructor body that calls m1() but forgets to explicitly initialize the field would cause a compile-time error. There is no corresponding compile-time error here because the compact constructor body implicitly initializes the field. Alex On 9/30/2020 8:45 AM, Pravin Jain wrote: > Dear sir, > In the following code, I have commented the error. > This error seems to be unnecessary, could have been a warning instead. > In the constructor of record access to instance variable before > explicit initialization, giving error, whereas same access is > available by invoking a method. > > public record TestRecord(int low, int high) { > public TestRecord { > System.out.println(low); > // System.out.println(this.low); // error, variable not initialized > m1(); // but this works, why? > } > public void m1() { > System.out.println(this.low); > } > public static void main(String[] args) { > TestRecord r1 = new TestRecord(7,12); > } > } > >