RFR 8025003: Base64 should be less strict with padding

Wed Nov 13 22:19:20 UTC 2013

Xueming Shen wrote on 11/13/13 12:28:
> On 11/13/2013 11:37 AM, Bill Shannon wrote:
>> Xueming Shen wrote on 11/13/13 11:11:
>>> On 11/13/2013 10:41 AM, Bill Shannon wrote:
>>>>
>>>>> The other thought is the charset API where a charset decoder can be configured
>>>>> to ignore, replace or report then malformed or unmappable input. Having
>>>>> support
>>>>> for all these actions is important for charset encoding/decoding but seems way
>>>>> too much for Base64 where I think the API should be simple for the majority of
>>>>> usages.
>>>> We started this with a request for a strict/lenient option.  That may still be
>>>> simpler than figuring out how to do strict decoding and report the error in a
>>>> way that users of the API can ignore the error and provide as much data as
>>>> possible.
>>>>
>>>>> In any case, it's not clear what we can do this late in the schedule. It might
>>>>> be prudent to just fix the MIME decoder to throw IAE consistently and re-visit
>>>>> the API support for a lenient decoder in JDK 9.
>>>> When we started this conversation there was plenty of time to fix this.  :-(
>>> The issue here is we disagree on the specification of what lenient should be and
>>> how the
>>> API should look like.
>>>
>>> Here is the proposed change to undo the "lenient padding handling for mime"
>>> change we
>>> did earlier to leave the option open for a complete "lenient base64" in future
>>> release,
>>> when we have a consensus
>> What other implementors of base64 MIME decoding software have you consulted,
>> or do you intend to consult in the future?
> 
> Yes, the plan is to see what other implementations do.
> 
> So far
> 
> (1) google's guava [1] just throws the exception
> 
>     com.google.common.io.BaseEncoding.base64().decode("QUJDA");
> 
> ==> java.lang.IllegalArgumentException:
> com.google.common.io.BaseEncoding$DecodingException: Invalid input length 5
> 
> I don't think any of the configuration options provide can make this exception
> go away.
> 
> (2) apache's commons-codec [2] silently drops the dangling 6-bits
>     new String(org.apache.commons.codec.binary.Base64.decodeBase64("QUJDA"))
> 
> ==> ABC
> 
> Its source code [3] at ln#465 suggests it's "TODO"
>     ...
>     case 1: // 6 bits - ignore entirely
>         // TODO not currently tested; perhaps it is impossible?
>         break;
>     ...
> 
> -Sherman
> 
> [1]
> http://docs.guava-libraries.googlecode.com/git/javadoc/com/google/common/io/BaseEncoding.html
> 
> [2]
> http://commons.apache.org/proper/commons-codec/apidocs/org/apache/commons/codec/binary/Base64.html
> 
> [3]
> http://svn.apache.org/viewvc/commons/proper/codec/trunk/src/main/java/org/apache/commons/codec/binary/Base64.java?revision=1447577&view=markup

base64 decoders that deal with simple, short strings aren't really a challenge.
Such strings are almost always encoded correctly.

The challenge is decoding base64 encoded content in a MIME part, e.g., in
an email message.  There's lots of really bad email software out there,
often spamming software, that doesn't always follow the rules of the spec.

Look at what other libraries that parse email messages do.  Look at what
email applications do.  If you send incorrectly encoded data to Gmail,
what does the Gmail web UI do?  What does Thunderbird or Outlook do?