RFR: JDK-8222187: java.util.Base64.Decoder stream adds unexpected null bytes at the end
Lance Andersen
LANCE.ANDERSEN at ORACLE.COM
Tue Jul 21 17:18:57 UTC 2020
Hi Roger,
I will plan to push tomorrow morning pending any last minute reviews
Best
Lance
> On Jul 21, 2020, at 9:57 AM, Roger Riggs <Roger.Riggs at oracle.com> wrote:
>
> Hi Rafaello, Lance,
>
> That looks good to me.
>
> Thanks, Roger
>
> On 7/19/20 2:31 PM, Lance Andersen wrote:
>> Hi Raffaello,
>>
>> I think the changes look reasonable.
>>
>> I have created a webrev, http://cr.openjdk.java.net/~lancea/8222187/webrev.00/ <http://cr.openjdk.java.net/~lancea/8222187/webrev.00/>, for others to review as it is a bit easier than the patch.
>>
>> I have also run the JCK tests in this area as well as mach5 tiers1-3 (which I believe Roger has also)
>>
>> Best
>> Lance
>>
>>> On Jul 15, 2020, at 5:44 PM, Raffaello Giulietti <raffaello.giulietti at gmail.com <mailto:raffaello.giulietti at gmail.com>> wrote:
>>>
>>> Hi Roger,
>>>
>>> On 2020-07-15 22:21, Roger Riggs wrote:
>>>> Hi Raffaello,
>>>> Base64.java:
>>>> 2: Please update 2nd copyright year to 2020
>>>
>>> I was unsure what to do here, thanks for guidance.
>>> I also removed the seemingly useless import at former L33.
>>>
>>>
>>>
>>>> leftovers():
>>>> 996: "&" the shortcutting && is more typical in a condition.
>>>> 997: the -= is buried in an expression and a reader might miss it.
>>>
>>> Corrected, as well as the analogous -= usage for wpos now at L1106-7
>>>
>>>
>>>> 1053: "can be reallocated to other vars": not a useful comment, reflecting on only one possible implementation that is out of the control of the developer.
>>>> I'd almost rather see 'len' than 'limit - off'; it might be informative to the reader if 'limit' was declared final. (1056:)
>>>
>>> Done, as well as declaring i and v final in the loop.
>>>
>>>
>>>
>>>> TestBase54.java:
>>>> 2: Update 2nd copyright year to 2020
>>>> 27: Please add the bug number to the @bug line.
>>>> Style-wise, I would remove the blank lines at the beginning of blocks. (1095, 1115)
>>>
>>> Done.
>>>
>>>
>>>> Thanks, Roger
>>>> On 7/14/20 11:47 AM, Raffaello Giulietti wrote:
>>>>> Hi Roger,
>>>>>
>>>>> here's the latest version of the patch.
>>>>>
>>>>> I did:
>>>>> * Withdraw the simplification in encodedOutLength(), as it is indeed out of scope for this bug fix.
>>>>> * Restore field names in DecInputStream except for nextin (now wpos) and nextout (now rpos) because they have slightly different semantics. That's just for clarity. I would have nothing against reusing the old names for the proposed purpose.
>>>>> * Switch back to the original no-arg read() implementation.
>>>>> * Adjust comment continuation lines.
>>>>> * Preserve the proposed logic of read(byte[], int, int) and the supporting methods.
>>>>>
>>>>> Others needed changes are:
>>>>> * Correct the copyright years: that's something better left to Oracle.
>>>>> * Remove the apparently useless import at L33. I could build and run without it.
>>>>>
>>>>>
>>>>> Greetings
>>>>> Raffaello
>>>>>
>>>
>>>
>>> ----
>>>
>>> # HG changeset patch
>>> # User lello
>>> # Date 1594848775 -7200
>>> # Wed Jul 15 23:32:55 2020 +0200
>>> # Node ID f7407f35e83ab508f0e6deab4f776bb1a1c85e68
>>> # Parent a5649ebf94193770ca763391dd63807059d333b4
>>> 8222187: java.util.Base64.Decoder stream adds unexpected null bytes at the end
>>> Reviewed-by: TBD
>>> Contributed-by: Raffaello Giulietti <raffaello.giulietti at gmail.com <mailto:raffaello.giulietti at gmail.com>>
>>>
>>> diff --git a/src/java.base/share/classes/java/util/Base64.java b/src/java.base/share/classes/java/util/Base64.java
>>> --- a/src/java.base/share/classes/java/util/Base64.java
>>> +++ b/src/java.base/share/classes/java/util/Base64.java
>>> @@ -1,5 +1,5 @@
>>> /*
>>> - * Copyright (c) 2012, 2019, Oracle and/or its affiliates. All rights reserved.
>>> + * Copyright (c) 2012, 2020, Oracle and/or its affiliates. All rights reserved.
>>> * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER.
>>> *
>>> * This code is free software; you can redistribute it and/or modify it
>>> @@ -30,7 +30,6 @@
>>> import java.io.IOException;
>>> import java.io.OutputStream;
>>> import java.nio.ByteBuffer;
>>> -import java.nio.charset.StandardCharsets;
>>>
>>> import sun.nio.cs.ISO_8859_1;
>>>
>>> @@ -961,12 +960,15 @@
>>>
>>> private final InputStream is;
>>> private final boolean isMIME;
>>> - private final int[] base64; // base64 -> byte mapping
>>> - private int bits = 0; // 24-bit buffer for decoding
>>> - private int nextin = 18; // next available "off" in "bits" for input;
>>> - // -> 18, 12, 6, 0
>>> - private int nextout = -8; // next available "off" in "bits" for output;
>>> - // -> 8, 0, -8 (no byte for output)
>>> + private final int[] base64; // base64 -> byte mapping
>>> + private int bits = 0; // 24-bit buffer for decoding
>>> +
>>> + /* writing bit pos inside bits; one of 24 (left, msb), 18, 12, 6, 0 */
>>> + private int wpos = 0;
>>> +
>>> + /* reading bit pos inside bits: one of 24 (left, msb), 16, 8, 0 */
>>> + private int rpos = 0;
>>> +
>>> private boolean eof = false;
>>> private boolean closed = false;
>>>
>>> @@ -983,107 +985,153 @@
>>> return read(sbBuf, 0, 1) == -1 ? -1 : sbBuf[0] & 0xff;
>>> }
>>>
>>> - private int eof(byte[] b, int off, int len, int oldOff)
>>> - throws IOException
>>> - {
>>> + private int leftovers(byte[] b, int off, int pos, int limit) {
>>> eof = true;
>>> - if (nextin != 18) {
>>> - if (nextin == 12)
>>> - throw new IOException("Base64 stream has one un-decoded dangling byte.");
>>> - // treat ending xx/xxx without padding character legal.
>>> - // same logic as v == '=' below
>>> - b[off++] = (byte)(bits >> (16));
>>> - if (nextin == 0) { // only one padding byte
>>> - if (len == 1) { // no enough output space
>>> - bits >>= 8; // shift to lowest byte
>>> - nextout = 0;
>>> - } else {
>>> - b[off++] = (byte) (bits >> 8);
>>> - }
>>> - }
>>> +
>>> + /*
>>> + * We use a loop here, as this method is executed only a few times.
>>> + * Unrolling the loop would probably not contribute much here.
>>> + */
>>> + while (rpos - 8 >= wpos && pos != limit) {
>>> + rpos -= 8;
>>> + b[pos++] = (byte) (bits >> rpos);
>>> }
>>> - return off == oldOff ? -1 : off - oldOff;
>>> + return pos - off != 0 || rpos - 8 >= wpos ? pos - off : -1;
>>> }
>>>
>>> - private int padding(byte[] b, int off, int len, int oldOff)
>>> - throws IOException
>>> - {
>>> - // = shiftto==18 unnecessary padding
>>> - // x= shiftto==12 dangling x, invalid unit
>>> - // xx= shiftto==6 && missing last '='
>>> - // xx=y or last is not '='
>>> - if (nextin == 18 || nextin == 12 ||
>>> - nextin == 6 && is.read() != '=') {
>>> - throw new IOException("Illegal base64 ending sequence:" + nextin);
>>> + private int eof(byte[] b, int off, int pos, int limit) throws IOException {
>>> + /*
>>> + * pos != limit
>>> + *
>>> + * wpos == 18: x dangling single x, invalid unit
>>> + * accept ending xx or xxx without padding characters
>>> + */
>>> + if (wpos == 18) {
>>> + throw new IOException("Base64 stream has one un-decoded dangling byte.");
>>> }
>>> - b[off++] = (byte)(bits >> (16));
>>> - if (nextin == 0) { // only one padding byte
>>> - if (len == 1) { // no enough output space
>>> - bits >>= 8; // shift to lowest byte
>>> - nextout = 0;
>>> - } else {
>>> - b[off++] = (byte) (bits >> 8);
>>> - }
>>> + rpos = 24;
>>> + return leftovers(b, off, pos, limit);
>>> + }
>>> +
>>> + private int padding(byte[] b, int off, int pos, int limit) throws IOException {
>>> + /*
>>> + * pos != limit
>>> + *
>>> + * wpos == 24: = (unnecessary padding)
>>> + * wpos == 18: x= (dangling single x, invalid unit)
>>> + * wpos == 12 and missing last '=': xx= (invalid padding)
>>> + * wpos == 12 and last is not '=': xx=x (invalid padding)
>>> + */
>>> + if (wpos >= 18 || wpos == 12 && is.read() != '=') {
>>> + throw new IOException("Illegal base64 ending sequence:" + wpos);
>>> }
>>> - eof = true;
>>> - return off - oldOff;
>>> + rpos = 24;
>>> + return leftovers(b, off, pos, limit);
>>> }
>>>
>>> @Override
>>> public int read(byte[] b, int off, int len) throws IOException {
>>> - if (closed)
>>> + if (closed) {
>>> throw new IOException("Stream is closed");
>>> - if (eof && nextout < 0) // eof and no leftover
>>> - return -1;
>>> - if (off < 0 || len < 0 || len > b.length - off)
>>> - throw new IndexOutOfBoundsException();
>>> - int oldOff = off;
>>> - while (nextout >= 0) { // leftover output byte(s) in bits buf
>>> - if (len == 0)
>>> - return off - oldOff;
>>> - b[off++] = (byte)(bits >> nextout);
>>> - len--;
>>> - nextout -= 8;
>>> + }
>>> + Objects.checkFromIndexSize(off, len, b.length);
>>> + if (len == 0) {
>>> + return 0;
>>> }
>>> - bits = 0;
>>> - while (len > 0) {
>>> - int v = is.read();
>>> - if (v == -1) {
>>> - return eof(b, off, len, oldOff);
>>> +
>>> + /*
>>> + * Rather than keeping 2 running vars (e.g., off and len),
>>> + * we only keep one (pos), while definitely fixing the boundaries
>>> + * of the range [off, limit).
>>> + * More specifically, each use of pos as an index in b meets
>>> + * pos - off >= 0 & limit - pos > 0
>>> + *
>>> + * Note that limit can overflow to Integer.MIN_VALUE. However,
>>> + * as long as comparisons with pos are as coded, there's no harm.
>>> + */
>>> + int pos = off;
>>> + final int limit = off + len;
>>> + if (eof) {
>>> + return leftovers(b, off, pos, limit);
>>> + }
>>> +
>>> + /*
>>> + * Leftovers from previous invocation; here, wpos = 0.
>>> + * There can be at most 2 leftover bytes (rpos <= 16).
>>> + * Further, b has at least one free place.
>>> + *
>>> + * The logic could be coded as a loop, (as in method leftovers())
>>> + * but the explicit "unrolling" makes it possible to generate
>>> + * better byte extraction code.
>>> + */
>>> + if (rpos == 16) {
>>> + b[pos++] = (byte) (bits >> 8);
>>> + rpos = 8;
>>> + if (pos == limit) {
>>> + return len;
>>> }
>>> - if ((v = base64[v]) < 0) {
>>> - if (v == -2) { // padding byte(s)
>>> - return padding(b, off, len, oldOff);
>>> - }
>>> - if (v == -1) {
>>> - if (!isMIME)
>>> - throw new IOException("Illegal base64 character " +
>>> - Integer.toString(v, 16));
>>> - continue; // skip if for rfc2045
>>> - }
>>> - // neve be here
>>> - }
>>> - bits |= (v << nextin);
>>> - if (nextin == 0) {
>>> - nextin = 18; // clear for next in
>>> - b[off++] = (byte)(bits >> 16);
>>> - if (len == 1) {
>>> - nextout = 8; // 2 bytes left in bits
>>> - break;
>>> - }
>>> - b[off++] = (byte)(bits >> 8);
>>> - if (len == 2) {
>>> - nextout = 0; // 1 byte left in bits
>>> - break;
>>> - }
>>> - b[off++] = (byte)bits;
>>> - len -= 3;
>>> - bits = 0;
>>> - } else {
>>> - nextin -= 6;
>>> + }
>>> + if (rpos == 8) {
>>> + b[pos++] = (byte) bits;
>>> + rpos = 0;
>>> + if (pos == limit) {
>>> + return len;
>>> }
>>> }
>>> - return off - oldOff;
>>> +
>>> + bits = 0;
>>> + wpos = 24;
>>> + for (;;) {
>>> + /* pos != limit & rpos == 0 */
>>> + final int i = is.read();
>>> + if (i < 0) {
>>> + return eof(b, off, pos, limit);
>>> + }
>>> + final int v = base64[i];
>>> + if (v < 0) {
>>> + /*
>>> + * i not in alphabet, thus
>>> + * v == -2: i is '=', the padding
>>> + * v == -1: i is something else, typically CR or LF
>>> + */
>>> + if (v == -1) {
>>> + if (isMIME) {
>>> + continue;
>>> + }
>>> + throw new IOException("Illegal base64 character 0x" +
>>> + Integer.toHexString(i));
>>> + }
>>> + return padding(b, off, pos, limit);
>>> + }
>>> + wpos -= 6;
>>> + bits |= v << wpos;
>>> + if (wpos != 0) {
>>> + continue;
>>> + }
>>> + if (limit - pos >= 3) {
>>> + /* frequently taken fast path, no need to track rpos */
>>> + b[pos++] = (byte) (bits >> 16);
>>> + b[pos++] = (byte) (bits >> 8);
>>> + b[pos++] = (byte) bits;
>>> + bits = 0;
>>> + wpos = 24;
>>> + if (pos == limit) {
>>> + return len;
>>> + }
>>> + continue;
>>> + }
>>> +
>>> + /* b has either 1 or 2 free places */
>>> + b[pos++] = (byte) (bits >> 16);
>>> + if (pos == limit) {
>>> + rpos = 16;
>>> + return len;
>>> + }
>>> + b[pos++] = (byte) (bits >> 8);
>>> + /* pos == limit, no need for an if */
>>> + rpos = 8;
>>> + return len;
>>> + }
>>> }
>>>
>>> @Override
>>> diff --git a/test/jdk/java/util/Base64/TestBase64.java b/test/jdk/java/util/Base64/TestBase64.java
>>> --- a/test/jdk/java/util/Base64/TestBase64.java
>>> +++ b/test/jdk/java/util/Base64/TestBase64.java
>>> @@ -1,5 +1,5 @@
>>> /*
>>> - * Copyright (c) 2012, 2017, Oracle and/or its affiliates. All rights reserved.
>>> + * Copyright (c) 2012, 2020, Oracle and/or its affiliates. All rights reserved.
>>> * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER.
>>> *
>>> * This code is free software; you can redistribute it and/or modify it
>>> @@ -24,7 +24,7 @@
>>> /**
>>> * @test
>>> * @bug 4235519 8004212 8005394 8007298 8006295 8006315 8006530 8007379 8008925
>>> - * 8014217 8025003 8026330 8028397 8129544 8165243 8176379
>>> + * 8014217 8025003 8026330 8028397 8129544 8165243 8176379 8222187
>>> * @summary tests java.util.Base64
>>> * @library /test/lib
>>> * @build jdk.test.lib.RandomFactory
>>> @@ -144,6 +144,10 @@
>>> testDecoderKeepsAbstinence(Base64.getDecoder());
>>> testDecoderKeepsAbstinence(Base64.getUrlDecoder());
>>> testDecoderKeepsAbstinence(Base64.getMimeDecoder());
>>> +
>>> + // tests patch addressing JDK-8222187
>>> + // https://bugs.openjdk.java.net/browse/JDK-8222187 <https://bugs.openjdk.java.net/browse/JDK-8222187>
>>> + testJDK_8222187();
>>> }
>>>
>>> private static void test(Base64.Encoder enc, Base64.Decoder dec,
>>> @@ -607,4 +611,26 @@
>>> }
>>> }
>>> }
>>> +
>>> + private static void testJDK_8222187() throws Throwable {
>>> + byte[] orig = "12345678".getBytes("US-ASCII");
>>> + byte[] encoded = Base64.getEncoder().encode(orig);
>>> + // decode using different buffer sizes, up to a longer one than needed
>>> + for (int bufferSize = 1; bufferSize <= encoded.length + 1; bufferSize++) {
>>> + try (
>>> + InputStream in = Base64.getDecoder().wrap(
>>> + new ByteArrayInputStream(encoded));
>>> + ByteArrayOutputStream baos = new ByteArrayOutputStream();
>>> + ) {
>>> + byte[] buffer = new byte[bufferSize];
>>> + int read;
>>> + while ((read = in.read(buffer, 0, bufferSize)) >= 0) {
>>> + baos.write(buffer, 0, read);
>>> + }
>>> + // compare result, output info if lengths do not match
>>> + byte[] decoded = baos.toByteArray();
>>> + checkEqual(decoded, orig, "Base64 stream decoding failed!");
>>> + }
>>> + }
>>> + }
>>> }
>>> <JDK-8222187.patch>
>>
>> <oracle_sig_logo.gif> <http://oracle.com/us/design/oracle-email-sig-198324.gif>
>> <http://oracle.com/us/design/oracle-email-sig-198324.gif> <http://oracle.com/us/design/oracle-email-sig-198324.gif>
>> <http://oracle.com/us/design/oracle-email-sig-198324.gif>Lance Andersen| Principal Member of Technical Staff | +1.781.442.2037
>> Oracle Java Engineering
>> 1 Network Drive
>> Burlington, MA 01803
>> Lance.Andersen at oracle.com <mailto:Lance.Andersen at oracle.com>
>>
>>
>>
>
Best
Lance
------------------
Lance Andersen| Principal Member of Technical Staff | +1.781.442.2037
Oracle Java Engineering
1 Network Drive
Burlington, MA 01803
Lance.Andersen at oracle.com
More information about the core-libs-dev
mailing list