RFR: JDK-8222187: java.util.Base64.Decoder stream adds unexpected null bytes at the end

Roger Riggs Roger.Riggs at oracle.com
Wed Jul 8 16:45:26 UTC 2020


Hi Rafaello,

Since you have expanded the scope of the fix, you will need to update 
the bug to
include the other changes you are making.  Or create a new issue for the 
additional work.

Personal preference in style has very little place when maintaining 
existing code.
The style should match the existing code.

There is little correspondence between the existing code and the new 
code that
it make is more difficult to determine that they are equivalent.

The messages in the exceptions should not be changed without a 
compelling reason.
1014: It would be more informative to include the rogue character in the 
message than to drop it.
As is, it gives some indication what's wrong about the end.

Grautituous changes such as changing the condition in 979 are unnecessary.

The change from using "eof" to "eos" is unconventional for IO streams 
and unnecessary.

The note about allowing limit to overflow to MIN_VALUE is informative 
but it would
be better to avoid it and keep a future maintainer from falling into the 
trap.

Regards, Roger

On 7/6/20 4:48 PM, Raffaello Giulietti wrote:
> Hi Roger,
>
> the structure of the original design is preserved, the details change 
> with the intent of having code that is easier to reason about.
>
> The main loop has many exit points in the original code, as well as in 
> the new code. For this reason, I prefer not to express it as a while 
> or as a counted for loop. That's why it has the for (;;) form, which 
> promptly warns that more exit points at unusual positions should be 
> expected.
>
> The "Illegal base64 character" IOException in the original code is 
> wrong, probably due to oversight. That's why the new code has two 
> vars, namely i and v, to convey more meaningful diagnostic.
>
> The new code keeps track of only one running variable pos instead of 
> the original two vars off and len, witĥout using more live variables 
> for this purpose. Less mutating vars help keeping the code simpler. 
> The method has only one mutating var.
>
> With the use of writing and reading bit positions fields wpos and 
> rpos, checking when a byte in field bits is ready means testing rpos - 
> 8 >= wpos.
>
> I can add more comments, e.g., in the form of Dijkstra/Hoare-style 
> loop invariants, if this helps improving confidence. Or I can add 
> constant-guarded assertions to the proposed code instead of just 
> comments. Or both.
>
>
>
> There's nothing I can find in the coding convention [1] about block 
> comments that states that continuation lines must start with the 
> asterisk *, although the examples do so. While I find these asterisks 
> quite distracting, it would be a trivial matter to add them in the 
> next review iteration.
>
> The end-of-line comments either comply with the rules in [1] or, in 
> the description of the fields, have been inadvertently "inherited" as 
> multiple lines from the original code. I will of course correct them 
> there in the next review iteration.
>
>
> Greetings
> Raffaello
>
> ----
>
> [1] 
> https://www.oracle.com/java/technologies/javase/codeconventions-comments.html
>
>
> On 2020-07-06 20:17, Roger Riggs wrote:
>> Hi Raffaello,
>>
>> I'm still not sure it needed this much of a re-write to fix the bug;
>> It will take some time to look at the changes.
>>
>> Regardless, OpenJDK conventions call for following the style of the 
>> existing code.
>> Your new comments follow neither the existing convention to use 
>> "//..." comments
>> nor the other prevalent comment form using /*... */ which use 
>> consistent indentation
>> and "* " on continuation lines.
>>
>> Regards, Roger
>>
>>
>> On 7/3/20 11:48 AM, Raffaello Giulietti wrote:
>>> Hello,
>>>
>>> after Roger's notes, which escaped my attention before, I've 
>>> withdrawn all the changes but for DecInputStream, *except* that I 
>>> couldn't resist to simplify the maths in encodedOutLength(), while 
>>> still using xxxExact() arithmetic.
>>>
>>> Sorry for the confusion
>>> Raffaello
>>>
>>>
>>>
>>>> Hi Raffaello,
>>>>
>>>> There is way more code changed here than is needed to fix the bug.
>>>> General enhancement should be separated from bug fixes.
>>>> It makes it easier to review to see the bug was fixed
>>>> and easier to separately review other code to see that there are no 
>>>> unexpected changes.
>>>>
>>>> If some of the changes are motivated by expected performance 
>>>> improvements,
>>>> there should be JMH tests comparing the before and after.
>>>> The change to use byte arrays seems useful, but even using char[]
>>>> there is little danger of cache thrashing.
>>>>
>>>> If using the code using xxxExact was correct, don't change it.
>>>> Those methods are intrinsified and perform as well or better than 
>>>> using long.
>>>> Usually, it is better to leave code alone and not risk breaking it.
>>>>
>>>> Special care needs taken when changing a method that is intrinsified.
>>>> The optimized version may use fields of the object and stop
>>>> working if they are changed.
>>>>
>>>> In the test, the range of buffer sizes tests seems to waste a lot
>>>> of cycles on sizes greater than the encoded size of the input.
>>>>
>>>> Regards, Roger
>>>
>>>
>>> ---------------------
>>>
>>> # HG changeset patch
>>> # User lello
>>> # Date 1593790152 -7200
>>> #      Fri Jul 03 17:29:12 2020 +0200
>>> # Node ID 73370832de1d2bf3b450930f5cb2467e10528c69
>>> # Parent  a7c0307232406c7b0c1a4b8c2de111077848203d
>>> 8222187: java.util.Base64.Decoder stream adds unexpected null bytes 
>>> at the end
>>> Reviewed-by: TBD
>>> Contributed-by: Raffaello Giulietti <raffaello.giulietti at gmail.com>
>>>
>>> diff --git a/src/java.base/share/classes/java/util/Base64.java 
>>> b/src/java.base/share/classes/java/util/Base64.java
>>> --- a/src/java.base/share/classes/java/util/Base64.java
>>> +++ b/src/java.base/share/classes/java/util/Base64.java
>>> @@ -255,14 +255,11 @@
>>>           *
>>>           */
>>>          private final int encodedOutLength(int srclen, boolean 
>>> throwOOME) {
>>> -            int len = 0;
>>> +            int len;
>>>              try {
>>> -                if (doPadding) {
>>> -                    len = Math.multiplyExact(4, 
>>> (Math.addExact(srclen, 2) / 3));
>>> -                } else {
>>> -                    int n = srclen % 3;
>>> -                    len = Math.addExact(Math.multiplyExact(4, 
>>> (srclen / 3)), (n == 0 ? 0 : n + 1));
>>> -                }
>>> +                len = doPadding
>>> +                        ? Math.multiplyExact(4, 
>>> (Math.addExact(srclen, 2) / 3))
>>> +                        : Math.addExact((Math.addExact(srclen, 2) / 
>>> 3), srclen);
>>>                  if (linemax > 0) { // line separators
>>>                      len = Math.addExact(len, (len - 1) / linemax * 
>>> newline.length);
>>>                  }
>>> @@ -961,14 +958,15 @@
>>>
>>>          private final InputStream is;
>>>          private final boolean isMIME;
>>> -        private final int[] base64;      // base64 -> byte mapping
>>> -        private int bits = 0;            // 24-bit buffer for decoding
>>> -        private int nextin = 18;         // next available "off" in 
>>> "bits" for input;
>>> -                                         // -> 18, 12, 6, 0
>>> -        private int nextout = -8;        // next available "off" in 
>>> "bits" for output;
>>> -                                         // -> 8, 0, -8 (no byte 
>>> for output)
>>> -        private boolean eof = false;
>>> -        private boolean closed = false;
>>> +        private final int[] base64;     // mapping from alphabet to 
>>> values
>>> +        private int bits;               // 24 bit buffer for decoding
>>> +        private int wpos;               // writing bit pos inside bits
>>> +        // one of 24 (left, msb), 18, 12, 6, 0
>>> +        private int rpos;               // reading bit pos inside bits
>>> +        // one of 24 (left, msb), 16, 8, 0
>>> +        private boolean eos;
>>> +        private boolean closed;
>>> +        private byte[] onebuf = new byte[1];
>>>
>>>          DecInputStream(InputStream is, int[] base64, boolean isMIME) {
>>>              this.is = is;
>>> @@ -976,114 +974,158 @@
>>>              this.isMIME = isMIME;
>>>          }
>>>
>>> -        private byte[] sbBuf = new byte[1];
>>> -
>>>          @Override
>>>          public int read() throws IOException {
>>> -            return read(sbBuf, 0, 1) == -1 ? -1 : sbBuf[0] & 0xff;
>>> +            return read(onebuf, 0, 1) >= 0 ? onebuf[0] & 0xff : -1;
>>> +        }
>>> +
>>> +        private int leftovers(byte[] b, int off, int pos, int limit) {
>>> +            eos = true;
>>> +            /*
>>> +            We use a loop here, as this code is executed only once.
>>> +            Unrolling the loop would probably not contribute much 
>>> here.
>>> +             */
>>> +            while (rpos - 8 >= wpos && pos != limit) {
>>> +                b[pos++] = (byte) (bits >> (rpos -= 8));
>>> +            }
>>> +            return pos - off != 0 || rpos - 8 >= wpos ? pos - off : 
>>> -1;
>>>          }
>>>
>>> -        private int eof(byte[] b, int off, int len, int oldOff)
>>> -            throws IOException
>>> -        {
>>> -            eof = true;
>>> -            if (nextin != 18) {
>>> -                if (nextin == 12)
>>> -                    throw new IOException("Base64 stream has one 
>>> un-decoded dangling byte.");
>>> -                // treat ending xx/xxx without padding character 
>>> legal.
>>> -                // same logic as v == '=' below
>>> -                b[off++] = (byte)(bits >> (16));
>>> -                if (nextin == 0) {           // only one padding byte
>>> -                    if (len == 1) {          // no enough output space
>>> -                        bits >>= 8;          // shift to lowest byte
>>> -                        nextout = 0;
>>> -                    } else {
>>> -                        b[off++] = (byte) (bits >>  8);
>>> -                    }
>>> -                }
>>> +        private int eos(byte[] b, int off, int pos, int limit) 
>>> throws IOException {
>>> +            /*
>>> +            wpos == 18: x     dangling single x, invalid unit
>>> +            accept ending xx or xxx without padding characters
>>> +             */
>>> +            if (wpos == 18) {
>>> +                throw new IOException("Base64 stream has one 
>>> un-decoded dangling byte");
>>>              }
>>> -            return off == oldOff ? -1 : off - oldOff;
>>> +            rpos = 24;
>>> +            return leftovers(b, off, pos, limit);
>>>          }
>>>
>>> -        private int padding(byte[] b, int off, int len, int oldOff)
>>> -            throws IOException
>>> -        {
>>> -            // =     shiftto==18 unnecessary padding
>>> -            // x=    shiftto==12 dangling x, invalid unit
>>> -            // xx=   shiftto==6 && missing last '='
>>> -            // xx=y  or last is not '='
>>> -            if (nextin == 18 || nextin == 12 ||
>>> -                nextin == 6 && is.read() != '=') {
>>> -                throw new IOException("Illegal base64 ending 
>>> sequence:" + nextin);
>>> +        private int padding(byte[] b, int off, int pos, int limit) 
>>> throws IOException {
>>> +            /*
>>> +            wpos == 24: =    (unnecessary padding)
>>> +            wpos == 18: x=   (dangling single x, invalid unit)
>>> +            wpos == 12 and missing last '=': xx=  (invalid padding)
>>> +            wpos == 12 and last is not '=': xx=x (invalid padding)
>>> +             */
>>> +            if (wpos >= 18 || wpos == 12 && is.read() != '=') {
>>> +                throw new IOException("Illegal base64 ending 
>>> sequence");
>>>              }
>>> -            b[off++] = (byte)(bits >> (16));
>>> -            if (nextin == 0) {           // only one padding byte
>>> -                if (len == 1) {          // no enough output space
>>> -                    bits >>= 8;          // shift to lowest byte
>>> -                    nextout = 0;
>>> -                } else {
>>> -                    b[off++] = (byte) (bits >>  8);
>>> -                }
>>> -            }
>>> -            eof = true;
>>> -            return off - oldOff;
>>> +            rpos = 24;
>>> +            return leftovers(b, off, pos, limit);
>>>          }
>>>
>>>          @Override
>>>          public int read(byte[] b, int off, int len) throws 
>>> IOException {
>>> -            if (closed)
>>> +            if (closed) {
>>>                  throw new IOException("Stream is closed");
>>> -            if (eof && nextout < 0)    // eof and no leftover
>>> -                return -1;
>>> -            if (off < 0 || len < 0 || len > b.length - off)
>>> -                throw new IndexOutOfBoundsException();
>>> -            int oldOff = off;
>>> -            while (nextout >= 0) {       // leftover output byte(s) 
>>> in bits buf
>>> -                if (len == 0)
>>> -                    return off - oldOff;
>>> -                b[off++] = (byte)(bits >> nextout);
>>> -                len--;
>>> -                nextout -= 8;
>>> +            }
>>> +            Objects.checkFromIndexSize(off, len, b.length);
>>> +            if (len == 0) {
>>> +                return 0;
>>>              }
>>> -            bits = 0;
>>> -            while (len > 0) {
>>> -                int v = is.read();
>>> -                if (v == -1) {
>>> -                    return eof(b, off, len, oldOff);
>>> +
>>> +            /*
>>> +            Rather than keeping 2 running vars (e.g., off and len), 
>>> we only
>>> +            keep one (pos), while definitely fixing the boundaries 
>>> of the range
>>> +            [off, limit).
>>> +
>>> +            Note that limit can overflow to Integer.MIN_VALUE. 
>>> However, as long
>>> +            as comparisons with pos are done as coded, there's no 
>>> harm.
>>> +
>>> +            In addition, limit - off (= len) is used from here on, 
>>> so the
>>> +            location for len can be reallocated to other vars 
>>> (e.g., limit).
>>> +             */
>>> +            int pos = off;
>>> +            int limit = off + len;
>>> +            if (eos) {
>>> +                return leftovers(b, off, pos, limit);
>>> +            }
>>> +
>>> +            /*
>>> +            leftovers from previous invocation; here, wpos = 0.
>>> +            There can be at most 2 leftover bytes (rpos <= 16).
>>> +            Further, the buffer b has at least one free place.
>>> +
>>> +            The logic could be coded as a loop, (as in method 
>>> leftovers())
>>> +            but the explicit "unrolling" makes it possible to 
>>> generate better
>>> +            byte extraction code.
>>> +             */
>>> +            if (rpos == 16) {
>>> +                b[pos++] = (byte) (bits >> 8);
>>> +                rpos = 8;
>>> +                if (pos == limit) {
>>> +                    return limit - off;
>>>                  }
>>> -                if ((v = base64[v]) < 0) {
>>> -                    if (v == -2) {       // padding byte(s)
>>> -                        return padding(b, off, len, oldOff);
>>> -                    }
>>> -                    if (v == -1) {
>>> -                        if (!isMIME)
>>> -                            throw new IOException("Illegal base64 
>>> character " +
>>> -                                Integer.toString(v, 16));
>>> -                        continue;        // skip if for rfc2045
>>> -                    }
>>> -                    // neve be here
>>> -                }
>>> -                bits |= (v << nextin);
>>> -                if (nextin == 0) {
>>> -                    nextin = 18;         // clear for next in
>>> -                    b[off++] = (byte)(bits >> 16);
>>> -                    if (len == 1) {
>>> -                        nextout = 8;    // 2 bytes left in bits
>>> -                        break;
>>> -                    }
>>> -                    b[off++] = (byte)(bits >> 8);
>>> -                    if (len == 2) {
>>> -                        nextout = 0;    // 1 byte left in bits
>>> -                        break;
>>> -                    }
>>> -                    b[off++] = (byte)bits;
>>> -                    len -= 3;
>>> -                    bits = 0;
>>> -                } else {
>>> -                    nextin -= 6;
>>> +            }
>>> +            if (rpos == 8) {
>>> +                b[pos++] = (byte) bits;
>>> +                rpos = 0;
>>> +                if (pos == limit) {
>>> +                    return limit - off;
>>>                  }
>>>              }
>>> -            return off - oldOff;
>>> + 8222187
>>> +            bits = 0;
>>> +            wpos = 24;
>>> +            for (;;) {
>>> +                // Here, pos != limit
>>> +                int i = is.read();
>>> +                if (i < 0) {
>>> +                    return eos(b, off, pos, limit);
>>> +                }
>>> +                int v = base64[i];
>>> +                if (v < 0) {
>>> +                    /*
>>> +                    i not in alphabet, thus
>>> +                        v = -2: i is '=', the padding
>>> +                        v = -1: i is something else, e.g., CR or LF
>>> +                     */
>>> +                    if (v == -1) {
>>> +                        if (isMIME) {
>>> +                            continue;
>>> +                        }
>>> +                        throw new IOException("Illegal base64 byte 
>>> 0x" +
>>> +                                Integer.toHexString(i));
>>> +                    }
>>> +                    return padding(b, off, pos, limit);
>>> +                }
>>> +                bits |= (v << (wpos -= 6));
>>> +                if (wpos != 0) {
>>> +                    continue;
>>> +                }
>>> +                if (limit - pos >= 3) {
>>> +                    // frequently taken fast path, no need to track 
>>> rpos
>>> +                    b[pos++] = (byte) (bits >> 16);
>>> +                    b[pos++] = (byte) (bits >> 8);
>>> +                    b[pos++] = (byte) bits;
>>> +                    bits = 0;
>>> +                    wpos = 24;
>>> +                    if (pos == limit) {
>>> +                        return limit - off;
>>> +                    }
>>> +                    continue;
>>> +                }
>>> +                /*
>>> +                Here, buffer b has either 1 or 2 free places, that is,
>>> +                limit - pos = 1 or limit - pos = 2.
>>> +
>>> +                As above, this could be coded as a loop. But since the
>>> +                shift lengths are explicit multiples of 8, better 
>>> code can be
>>> +                probably generated.
>>> +                 */
>>> +                b[pos++] = (byte) (bits >> 16);
>>> +                if (pos == limit) {
>>> +                    rpos = 16;
>>> +                    return limit - off;
>>> +                }
>>> +                b[pos++] = (byte) (bits >> 8);
>>> +                // Here, pos = limit, no need for an if.
>>> +                rpos = 8;
>>> +                return limit - off;
>>> +            }
>>>          }
>>>
>>>          @Override
>>> diff --git a/test/jdk/java/util/Base64/TestBase64.java 
>>> b/test/jdk/java/util/Base64/TestBase64.java
>>> --- a/test/jdk/java/util/Base64/TestBase64.java
>>> +++ b/test/jdk/java/util/Base64/TestBase64.java
>>> @@ -144,6 +144,10 @@
>>>          testDecoderKeepsAbstinence(Base64.getDecoder());
>>>          testDecoderKeepsAbstinence(Base64.getUrlDecoder());
>>>          testDecoderKeepsAbstinence(Base64.getMimeDecoder());
>>> +
>>> +        // tests patch addressing JDK-8222187
>>> +        // https://bugs.openjdk.java.net/browse/JDK-8222187
>>> +        testJDK_8222187();
>>>      }
>>>
>>>      private static void test(Base64.Encoder enc, Base64.Decoder dec,
>>> @@ -607,4 +611,26 @@
>>>              }
>>>          }
>>>      }
>>> +
>>> +    private static void testJDK_8222187() throws Throwable {
>>> +        byte[] orig = "12345678".getBytes("US-ASCII");
>>> +        byte[] encoded = Base64.getEncoder().encode(orig);
>>> +        // decode using different buffer sizes
>>> +        for (int bufferSize = 1; bufferSize <= encoded.length + 1; 
>>> bufferSize++) {
>>> +            try (
>>> +                    InputStream in = Base64.getDecoder().wrap(
>>> +                            new ByteArrayInputStream(encoded));
>>> +                    ByteArrayOutputStream baos = new 
>>> ByteArrayOutputStream();
>>> +            ) {
>>> +                byte[] buffer = new byte[bufferSize];
>>> +                int read;
>>> +                while ((read = in.read(buffer, 0, bufferSize)) >= 0) {
>>> +                    baos.write(buffer, 0, read);
>>> +                }
>>> +                // compare result, output info if lengths do not match
>>> +                byte[] decoded = baos.toByteArray();
>>> +                checkEqual(decoded, orig, "Base64 stream decoding 
>>> failed!");
>>> +            }
>>> +        }
>>> +    }
>>>  }
>>



More information about the core-libs-dev mailing list