RFR: 8245153 Unicode encoded double-quoted empty string does not compile

Jim Laskey james.laskey at oracle.com
Thu May 28 12:02:06 UTC 2020


I've since rewritten this code (targetting for 16) to not use reset at all for this very reason. Your solution may work but the safer solution is to

       case 2: // Starting an empty string literal.
            tk = Tokens.TokenKind.STRINGLITERAL;
            return;


Your test should include:

    String s1 = \u0022\u0022;
    String s2 = "\u0022;
    String s3 = \u0022";
    String s4 = \u0022\\u0022\u0022;
    
Cheers,

-- Jim


> On May 28, 2020, at 5:38 AM, Adam Sotona <adam.sotona at oracle.com> wrote:
> 
> Hi,
> please help me to review fix of Unicode encoded double-quoted empty string compilation.
> I found the root cause is in com.sun.tools.javac.parser.JavaTokenizer::scanString(int pos). It is trying to un-read unicode quotes by calling com.sun.tools.javac.parser.UnicodeReader::reset(int pos), however that approach works only luckily when one source character matches to one String character (standard quotes in this case).
> If the quotes are written in Unicode notation \u0022\u0022 , the reset call moves reader.bp cursor to original pos-1 position and reads one character.  
> As the initial pos parameter points AFTER the last parsed character, so position of the first backslash from \u0022\u0022 is already lost and next character parsed is number 2 instead of unicode quotes.
> The fix just repositions reader to the right place, no matter if quotes are standard nor unicode encoded. 
> Plus there is a new test added for this case.
> 
> JBS: https://bugs.openjdk.java.net/browse/JDK-8245153
> webrev: http://cr.openjdk.java.net/~asotona/8245153/
> 
> All Tier 1, 2 and 3 tests are passing.
> 
> Thanks for the review,
> Adam



More information about the compiler-dev mailing list