RFR: 8245153 Unicode encoded double-quoted empty string does not compile

Jim Laskey james.laskey at oracle.com
Thu May 28 12:04:33 UTC 2020


Test should probably also include

   String s0 = "";

:-)

> On May 28, 2020, at 9:02 AM, Jim Laskey <james.laskey at oracle.com> wrote:
> 
> I've since rewritten this code (targetting for 16) to not use reset at all for this very reason. Your solution may work but the safer solution is to
> 
>       case 2: // Starting an empty string literal.
>            tk = Tokens.TokenKind.STRINGLITERAL;
>            return;
> 
> 
> Your test should include:
> 
>    String s1 = \u0022\u0022;
>    String s2 = "\u0022;
>    String s3 = \u0022";
>    String s4 = \u0022\\u0022\u0022;
> 
> Cheers,
> 
> -- Jim
> 
> 
>> On May 28, 2020, at 5:38 AM, Adam Sotona <adam.sotona at oracle.com> wrote:
>> 
>> Hi,
>> please help me to review fix of Unicode encoded double-quoted empty string compilation.
>> I found the root cause is in com.sun.tools.javac.parser.JavaTokenizer::scanString(int pos). It is trying to un-read unicode quotes by calling com.sun.tools.javac.parser.UnicodeReader::reset(int pos), however that approach works only luckily when one source character matches to one String character (standard quotes in this case).
>> If the quotes are written in Unicode notation \u0022\u0022 , the reset call moves reader.bp cursor to original pos-1 position and reads one character.  
>> As the initial pos parameter points AFTER the last parsed character, so position of the first backslash from \u0022\u0022 is already lost and next character parsed is number 2 instead of unicode quotes.
>> The fix just repositions reader to the right place, no matter if quotes are standard nor unicode encoded. 
>> Plus there is a new test added for this case.
>> 
>> JBS: https://bugs.openjdk.java.net/browse/JDK-8245153
>> webrev: http://cr.openjdk.java.net/~asotona/8245153/
>> 
>> All Tier 1, 2 and 3 tests are passing.
>> 
>> Thanks for the review,
>> Adam
> 



More information about the compiler-dev mailing list