RFR: JDK-8222187: java.util.Base64.Decoder stream adds unexpected null bytes at the end
Raffaello Giulietti
raffaello.giulietti at gmail.com
Thu Jul 23 14:44:53 UTC 2020
Thanks for pushing changeset 865b5ca81009.
Raffaello
On 2020-07-21 19:18, Lance Andersen wrote:
> Hi Roger,
>
> I will plan to push tomorrow morning pending any last minute reviews
>
> Best
> Lance
>
>> On Jul 21, 2020, at 9:57 AM, Roger Riggs <Roger.Riggs at oracle.com
>> <mailto:Roger.Riggs at oracle.com>> wrote:
>>
>> Hi Rafaello, Lance,
>>
>> That looks good to me.
>>
>> Thanks, Roger
>>
>> On 7/19/20 2:31 PM, Lance Andersen wrote:
>>> Hi Raffaello,
>>>
>>> I think the changes look reasonable.
>>>
>>> I have created a webrev,
>>> http://cr.openjdk.java.net/~lancea/8222187/webrev.00/, for others to
>>> review as it is a bit easier than the patch.
>>>
>>> I have also run the JCK tests in this area as well as mach5 tiers1-3
>>> (which I believe Roger has also)
>>>
>>> Best
>>> Lance
>>>
>>>> On Jul 15, 2020, at 5:44 PM, Raffaello Giulietti
>>>> <raffaello.giulietti at gmail.com
>>>> <mailto:raffaello.giulietti at gmail.com>> wrote:
>>>>
>>>> Hi Roger,
>>>>
>>>> On 2020-07-15 22:21, Roger Riggs wrote:
>>>>> Hi Raffaello,
>>>>> Base64.java:
>>>>> 2: Please update 2nd copyright year to 2020
>>>>
>>>> I was unsure what to do here, thanks for guidance.
>>>> I also removed the seemingly useless import at former L33.
>>>>
>>>>
>>>>
>>>>> leftovers():
>>>>> 996: "&" the shortcutting && is more typical in a condition.
>>>>> 997: the -= is buried in an expression and a reader might miss it.
>>>>
>>>> Corrected, as well as the analogous -= usage for wpos now at L1106-7
>>>>
>>>>
>>>>> 1053: "can be reallocated to other vars": not a useful comment,
>>>>> reflecting on only one possible implementation that is out of the
>>>>> control of the developer.
>>>>> I'd almost rather see 'len' than 'limit - off'; it might be
>>>>> informative to the reader if 'limit' was declared final. (1056:)
>>>>
>>>> Done, as well as declaring i and v final in the loop.
>>>>
>>>>
>>>>
>>>>> TestBase54.java:
>>>>> 2: Update 2nd copyright year to 2020
>>>>> 27: Please add the bug number to the @bug line.
>>>>> Style-wise, I would remove the blank lines at the beginning of
>>>>> blocks. (1095, 1115)
>>>>
>>>> Done.
>>>>
>>>>
>>>>> Thanks, Roger
>>>>> On 7/14/20 11:47 AM, Raffaello Giulietti wrote:
>>>>>> Hi Roger,
>>>>>>
>>>>>> here's the latest version of the patch.
>>>>>>
>>>>>> I did:
>>>>>> * Withdraw the simplification in encodedOutLength(), as it is
>>>>>> indeed out of scope for this bug fix.
>>>>>> * Restore field names in DecInputStream except for nextin (now
>>>>>> wpos) and nextout (now rpos) because they have slightly different
>>>>>> semantics. That's just for clarity. I would have nothing against
>>>>>> reusing the old names for the proposed purpose.
>>>>>> * Switch back to the original no-arg read() implementation.
>>>>>> * Adjust comment continuation lines.
>>>>>> * Preserve the proposed logic of read(byte[], int, int) and the
>>>>>> supporting methods.
>>>>>>
>>>>>> Others needed changes are:
>>>>>> * Correct the copyright years: that's something better left to Oracle.
>>>>>> * Remove the apparently useless import at L33. I could build and
>>>>>> run without it.
>>>>>>
>>>>>>
>>>>>> Greetings
>>>>>> Raffaello
>>>>>>
>>>>
>>>>
>>>> ----
>>>>
>>>> # HG changeset patch
>>>> # User lello
>>>> # Date 1594848775 -7200
>>>> # Wed Jul 15 23:32:55 2020 +0200
>>>> # Node ID f7407f35e83ab508f0e6deab4f776bb1a1c85e68
>>>> # Parent a5649ebf94193770ca763391dd63807059d333b4
>>>> 8222187: java.util.Base64.Decoder stream adds unexpected null bytes
>>>> at the end
>>>> Reviewed-by: TBD
>>>> Contributed-by: Raffaello Giulietti <raffaello.giulietti at gmail.com
>>>> <mailto:raffaello.giulietti at gmail.com>>
>>>>
>>>> diff --git a/src/java.base/share/classes/java/util/Base64.java
>>>> b/src/java.base/share/classes/java/util/Base64.java
>>>> --- a/src/java.base/share/classes/java/util/Base64.java
>>>> +++ b/src/java.base/share/classes/java/util/Base64.java
>>>> @@ -1,5 +1,5 @@
>>>> /*
>>>> - * Copyright (c) 2012, 2019, Oracle and/or its affiliates. All
>>>> rights reserved.
>>>> + * Copyright (c) 2012, 2020, Oracle and/or its affiliates. All
>>>> rights reserved.
>>>> * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER.
>>>> *
>>>> * This code is free software; you can redistribute it and/or modify it
>>>> @@ -30,7 +30,6 @@
>>>> import java.io.IOException;
>>>> import java.io.OutputStream;
>>>> import java.nio.ByteBuffer;
>>>> -import java.nio.charset.StandardCharsets;
>>>>
>>>> import sun.nio.cs.ISO_8859_1;
>>>>
>>>> @@ -961,12 +960,15 @@
>>>>
>>>> private final InputStream is;
>>>> private final boolean isMIME;
>>>> - private final int[] base64; // base64 -> byte mapping
>>>> - private int bits = 0; // 24-bit buffer for decoding
>>>> - private int nextin = 18; // next available "off" in
>>>> "bits" for input;
>>>> - // -> 18, 12, 6, 0
>>>> - private int nextout = -8; // next available "off" in
>>>> "bits" for output;
>>>> - // -> 8, 0, -8 (no byte
>>>> for output)
>>>> + private final int[] base64; // base64 -> byte mapping
>>>> + private int bits = 0; // 24-bit buffer for decoding
>>>> +
>>>> + /* writing bit pos inside bits; one of 24 (left, msb), 18,
>>>> 12, 6, 0 */
>>>> + private int wpos = 0;
>>>> +
>>>> + /* reading bit pos inside bits: one of 24 (left, msb), 16,
>>>> 8, 0 */
>>>> + private int rpos = 0;
>>>> +
>>>> private boolean eof = false;
>>>> private boolean closed = false;
>>>>
>>>> @@ -983,107 +985,153 @@
>>>> return read(sbBuf, 0, 1) == -1 ? -1 : sbBuf[0] & 0xff;
>>>> }
>>>>
>>>> - private int eof(byte[] b, int off, int len, int oldOff)
>>>> - throws IOException
>>>> - {
>>>> + private int leftovers(byte[] b, int off, int pos, int limit) {
>>>> eof = true;
>>>> - if (nextin != 18) {
>>>> - if (nextin == 12)
>>>> - throw new IOException("Base64 stream has one
>>>> un-decoded dangling byte.");
>>>> - // treat ending xx/xxx without padding character legal.
>>>> - // same logic as v == '=' below
>>>> - b[off++] = (byte)(bits >> (16));
>>>> - if (nextin == 0) { // only one padding byte
>>>> - if (len == 1) { // no enough output space
>>>> - bits >>= 8; // shift to lowest byte
>>>> - nextout = 0;
>>>> - } else {
>>>> - b[off++] = (byte) (bits >> 8);
>>>> - }
>>>> - }
>>>> +
>>>> + /*
>>>> + * We use a loop here, as this method is executed only
>>>> a few times.
>>>> + * Unrolling the loop would probably not contribute
>>>> much here.
>>>> + */
>>>> + while (rpos - 8 >= wpos && pos != limit) {
>>>> + rpos -= 8;
>>>> + b[pos++] = (byte) (bits >> rpos);
>>>> }
>>>> - return off == oldOff ? -1 : off - oldOff;
>>>> + return pos - off != 0 || rpos - 8 >= wpos ? pos - off : -1;
>>>> }
>>>>
>>>> - private int padding(byte[] b, int off, int len, int oldOff)
>>>> - throws IOException
>>>> - {
>>>> - // = shiftto==18 unnecessary padding
>>>> - // x= shiftto==12 dangling x, invalid unit
>>>> - // xx= shiftto==6 && missing last '='
>>>> - // xx=y or last is not '='
>>>> - if (nextin == 18 || nextin == 12 ||
>>>> - nextin == 6 && is.read() != '=') {
>>>> - throw new IOException("Illegal base64 ending
>>>> sequence:" + nextin);
>>>> + private int eof(byte[] b, int off, int pos, int limit)
>>>> throws IOException {
>>>> + /*
>>>> + * pos != limit
>>>> + *
>>>> + * wpos == 18: x dangling single x, invalid unit
>>>> + * accept ending xx or xxx without padding characters
>>>> + */
>>>> + if (wpos == 18) {
>>>> + throw new IOException("Base64 stream has one
>>>> un-decoded dangling byte.");
>>>> }
>>>> - b[off++] = (byte)(bits >> (16));
>>>> - if (nextin == 0) { // only one padding byte
>>>> - if (len == 1) { // no enough output space
>>>> - bits >>= 8; // shift to lowest byte
>>>> - nextout = 0;
>>>> - } else {
>>>> - b[off++] = (byte) (bits >> 8);
>>>> - }
>>>> + rpos = 24;
>>>> + return leftovers(b, off, pos, limit);
>>>> + }
>>>> +
>>>> + private int padding(byte[] b, int off, int pos, int limit)
>>>> throws IOException {
>>>> + /*
>>>> + * pos != limit
>>>> + *
>>>> + * wpos == 24: = (unnecessary padding)
>>>> + * wpos == 18: x= (dangling single x, invalid unit)
>>>> + * wpos == 12 and missing last '=': xx= (invalid padding)
>>>> + * wpos == 12 and last is not '=': xx=x (invalid padding)
>>>> + */
>>>> + if (wpos >= 18 || wpos == 12 && is.read() != '=') {
>>>> + throw new IOException("Illegal base64 ending
>>>> sequence:" + wpos);
>>>> }
>>>> - eof = true;
>>>> - return off - oldOff;
>>>> + rpos = 24;
>>>> + return leftovers(b, off, pos, limit);
>>>> }
>>>>
>>>> @Override
>>>> public int read(byte[] b, int off, int len) throws IOException {
>>>> - if (closed)
>>>> + if (closed) {
>>>> throw new IOException("Stream is closed");
>>>> - if (eof && nextout < 0) // eof and no leftover
>>>> - return -1;
>>>> - if (off < 0 || len < 0 || len > b.length - off)
>>>> - throw new IndexOutOfBoundsException();
>>>> - int oldOff = off;
>>>> - while (nextout >= 0) { // leftover output byte(s)
>>>> in bits buf
>>>> - if (len == 0)
>>>> - return off - oldOff;
>>>> - b[off++] = (byte)(bits >> nextout);
>>>> - len--;
>>>> - nextout -= 8;
>>>> + }
>>>> + Objects.checkFromIndexSize(off, len, b.length);
>>>> + if (len == 0) {
>>>> + return 0;
>>>> }
>>>> - bits = 0;
>>>> - while (len > 0) {
>>>> - int v = is.read();
>>>> - if (v == -1) {
>>>> - return eof(b, off, len, oldOff);
>>>> +
>>>> + /*
>>>> + * Rather than keeping 2 running vars (e.g., off and len),
>>>> + * we only keep one (pos), while definitely fixing the
>>>> boundaries
>>>> + * of the range [off, limit).
>>>> + * More specifically, each use of pos as an index in b
>>>> meets
>>>> + * pos - off >= 0 & limit - pos > 0
>>>> + *
>>>> + * Note that limit can overflow to Integer.MIN_VALUE.
>>>> However,
>>>> + * as long as comparisons with pos are as coded,
>>>> there's no harm.
>>>> + */
>>>> + int pos = off;
>>>> + final int limit = off + len;
>>>> + if (eof) {
>>>> + return leftovers(b, off, pos, limit);
>>>> + }
>>>> +
>>>> + /*
>>>> + * Leftovers from previous invocation; here, wpos = 0.
>>>> + * There can be at most 2 leftover bytes (rpos <= 16).
>>>> + * Further, b has at least one free place.
>>>> + *
>>>> + * The logic could be coded as a loop, (as in method
>>>> leftovers())
>>>> + * but the explicit "unrolling" makes it possible to
>>>> generate
>>>> + * better byte extraction code.
>>>> + */
>>>> + if (rpos == 16) {
>>>> + b[pos++] = (byte) (bits >> 8);
>>>> + rpos = 8;
>>>> + if (pos == limit) {
>>>> + return len;
>>>> }
>>>> - if ((v = base64[v]) < 0) {
>>>> - if (v == -2) { // padding byte(s)
>>>> - return padding(b, off, len, oldOff);
>>>> - }
>>>> - if (v == -1) {
>>>> - if (!isMIME)
>>>> - throw new IOException("Illegal base64
>>>> character " +
>>>> - Integer.toString(v, 16));
>>>> - continue; // skip if for rfc2045
>>>> - }
>>>> - // neve be here
>>>> - }
>>>> - bits |= (v << nextin);
>>>> - if (nextin == 0) {
>>>> - nextin = 18; // clear for next in
>>>> - b[off++] = (byte)(bits >> 16);
>>>> - if (len == 1) {
>>>> - nextout = 8; // 2 bytes left in bits
>>>> - break;
>>>> - }
>>>> - b[off++] = (byte)(bits >> 8);
>>>> - if (len == 2) {
>>>> - nextout = 0; // 1 byte left in bits
>>>> - break;
>>>> - }
>>>> - b[off++] = (byte)bits;
>>>> - len -= 3;
>>>> - bits = 0;
>>>> - } else {
>>>> - nextin -= 6;
>>>> + }
>>>> + if (rpos == 8) {
>>>> + b[pos++] = (byte) bits;
>>>> + rpos = 0;
>>>> + if (pos == limit) {
>>>> + return len;
>>>> }
>>>> }
>>>> - return off - oldOff;
>>>> +
>>>> + bits = 0;
>>>> + wpos = 24;
>>>> + for (;;) {
>>>> + /* pos != limit & rpos == 0 */
>>>> + final int i = is.read();
>>>> + if (i < 0) {
>>>> + return eof(b, off, pos, limit);
>>>> + }
>>>> + final int v = base64[i];
>>>> + if (v < 0) {
>>>> + /*
>>>> + * i not in alphabet, thus
>>>> + * v == -2: i is '=', the padding
>>>> + * v == -1: i is something else, typically
>>>> CR or LF
>>>> + */
>>>> + if (v == -1) {
>>>> + if (isMIME) {
>>>> + continue;
>>>> + }
>>>> + throw new IOException("Illegal base64
>>>> character 0x" +
>>>> + Integer.toHexString(i));
>>>> + }
>>>> + return padding(b, off, pos, limit);
>>>> + }
>>>> + wpos -= 6;
>>>> + bits |= v << wpos;
>>>> + if (wpos != 0) {
>>>> + continue;
>>>> + }
>>>> + if (limit - pos >= 3) {
>>>> + /* frequently taken fast path, no need to track
>>>> rpos */
>>>> + b[pos++] = (byte) (bits >> 16);
>>>> + b[pos++] = (byte) (bits >> 8);
>>>> + b[pos++] = (byte) bits;
>>>> + bits = 0;
>>>> + wpos = 24;
>>>> + if (pos == limit) {
>>>> + return len;
>>>> + }
>>>> + continue;
>>>> + }
>>>> +
>>>> + /* b has either 1 or 2 free places */
>>>> + b[pos++] = (byte) (bits >> 16);
>>>> + if (pos == limit) {
>>>> + rpos = 16;
>>>> + return len;
>>>> + }
>>>> + b[pos++] = (byte) (bits >> 8);
>>>> + /* pos == limit, no need for an if */
>>>> + rpos = 8;
>>>> + return len;
>>>> + }
>>>> }
>>>>
>>>> @Override
>>>> diff --git a/test/jdk/java/util/Base64/TestBase64.java
>>>> b/test/jdk/java/util/Base64/TestBase64.java
>>>> --- a/test/jdk/java/util/Base64/TestBase64.java
>>>> +++ b/test/jdk/java/util/Base64/TestBase64.java
>>>> @@ -1,5 +1,5 @@
>>>> /*
>>>> - * Copyright (c) 2012, 2017, Oracle and/or its affiliates. All
>>>> rights reserved.
>>>> + * Copyright (c) 2012, 2020, Oracle and/or its affiliates. All
>>>> rights reserved.
>>>> * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER.
>>>> *
>>>> * This code is free software; you can redistribute it and/or modify it
>>>> @@ -24,7 +24,7 @@
>>>> /**
>>>> * @test
>>>> * @bug 4235519 8004212 8005394 8007298 8006295 8006315 8006530
>>>> 8007379 8008925
>>>> - * 8014217 8025003 8026330 8028397 8129544 8165243 8176379
>>>> + * 8014217 8025003 8026330 8028397 8129544 8165243 8176379 8222187
>>>> * @summary tests java.util.Base64
>>>> * @library /test/lib
>>>> * @build jdk.test.lib.RandomFactory
>>>> @@ -144,6 +144,10 @@
>>>> testDecoderKeepsAbstinence(Base64.getDecoder());
>>>> testDecoderKeepsAbstinence(Base64.getUrlDecoder());
>>>> testDecoderKeepsAbstinence(Base64.getMimeDecoder());
>>>> +
>>>> + // tests patch addressing JDK-8222187
>>>> + // https://bugs.openjdk.java.net/browse/JDK-8222187
>>>> + testJDK_8222187();
>>>> }
>>>>
>>>> private static void test(Base64.Encoder enc, Base64.Decoder dec,
>>>> @@ -607,4 +611,26 @@
>>>> }
>>>> }
>>>> }
>>>> +
>>>> + private static void testJDK_8222187() throws Throwable {
>>>> + byte[] orig = "12345678".getBytes("US-ASCII");
>>>> + byte[] encoded = Base64.getEncoder().encode(orig);
>>>> + // decode using different buffer sizes, up to a longer one
>>>> than needed
>>>> + for (int bufferSize = 1; bufferSize <= encoded.length + 1;
>>>> bufferSize++) {
>>>> + try (
>>>> + InputStream in = Base64.getDecoder().wrap(
>>>> + new ByteArrayInputStream(encoded));
>>>> + ByteArrayOutputStream baos = new
>>>> ByteArrayOutputStream();
>>>> + ) {
>>>> + byte[] buffer = new byte[bufferSize];
>>>> + int read;
>>>> + while ((read = in.read(buffer, 0, bufferSize)) >= 0) {
>>>> + baos.write(buffer, 0, read);
>>>> + }
>>>> + // compare result, output info if lengths do not match
>>>> + byte[] decoded = baos.toByteArray();
>>>> + checkEqual(decoded, orig, "Base64 stream decoding
>>>> failed!");
>>>> + }
>>>> + }
>>>> + }
>>>> }
>>>> <JDK-8222187.patch>
>>>
>>> <oracle_sig_logo.gif>
>>> <http://oracle.com/us/design/oracle-email-sig-198324.gif>
>>> <http://oracle.com/us/design/oracle-email-sig-198324.gif><http://oracle.com/us/design/oracle-email-sig-198324.gif>
>>> <http://oracle.com/us/design/oracle-email-sig-198324.gif>Lance
>>> Andersen| Principal Member of Technical Staff | +1.781.442.2037
>>> Oracle Java Engineering
>>> 1 Network Drive
>>> Burlington, MA 01803
>>> Lance.Andersen at oracle.com <mailto:Lance.Andersen at oracle.com>
>>>
>>>
>>>
>>
>
>
> Best
> Lance
> ------------------
>
>
>
> Lance Andersen| Principal Member of Technical Staff | +1.781.442.2037
> Oracle Java Engineering
> 1 Network Drive
> Burlington, MA 01803
> Lance.Andersen at oracle.com <mailto:Lance.Andersen at oracle.com>
>
>
>
>
More information about the core-libs-dev
mailing list