RFR 8025003: Base64 should be less strict with padding

Wed Nov 13 20:28:30 UTC 2013

On 11/13/2013 11:37 AM, Bill Shannon wrote:
> Xueming Shen wrote on 11/13/13 11:11:
>> On 11/13/2013 10:41 AM, Bill Shannon wrote:
>>>
>>>> The other thought is the charset API where a charset decoder can be configured
>>>> to ignore, replace or report then malformed or unmappable input. Having support
>>>> for all these actions is important for charset encoding/decoding but seems way
>>>> too much for Base64 where I think the API should be simple for the majority of
>>>> usages.
>>> We started this with a request for a strict/lenient option.  That may still be
>>> simpler than figuring out how to do strict decoding and report the error in a
>>> way that users of the API can ignore the error and provide as much data as
>>> possible.
>>>
>>>> In any case, it's not clear what we can do this late in the schedule. It might
>>>> be prudent to just fix the MIME decoder to throw IAE consistently and re-visit
>>>> the API support for a lenient decoder in JDK 9.
>>> When we started this conversation there was plenty of time to fix this.  :-(
>> The issue here is we disagree on the specification of what lenient should be and
>> how the
>> API should look like.
>>
>> Here is the proposed change to undo the "lenient padding handling for mime"
>> change we
>> did earlier to leave the option open for a complete "lenient base64" in future
>> release,
>> when we have a consensus
> What other implementors of base64 MIME decoding software have you consulted,
> or do you intend to consult in the future?

Yes, the plan is to see what other implementations do.

So far

(1) google's guava [1] just throws the exception

     com.google.common.io.BaseEncoding.base64().decode("QUJDA");

==> java.lang.IllegalArgumentException: com.google.common.io.BaseEncoding$DecodingException: Invalid input length 5

I don't think any of the configuration options provide can make this exception go away.

(2) apache's commons-codec [2] silently drops the dangling 6-bits
     new String(org.apache.commons.codec.binary.Base64.decodeBase64("QUJDA"))

==> ABC

Its source code [3] at ln#465 suggests it's "TODO"
     ...
     case 1: // 6 bits - ignore entirely
         // TODO not currently tested; perhaps it is impossible?
         break;
     ...

-Sherman

[1] http://docs.guava-libraries.googlecode.com/git/javadoc/com/google/common/io/BaseEncoding.html
[2] http://commons.apache.org/proper/commons-codec/apidocs/org/apache/commons/codec/binary/Base64.html
[3] http://svn.apache.org/viewvc/commons/proper/codec/trunk/src/main/java/org/apache/commons/codec/binary/Base64.java?revision=1447577&view=markup

> What experiments have you done with other base64 MIME decoding software or
> applications to determine how they handle these cases?
>
> I'm trying to determine how we're going to reach consensus in the future.
>
> My base64 MIME decoding software has evolved over time based on customer
> requirements.  I'm trying to give you the benefit of that experience so
> that you don't need to waste years getting to the same point I got to.
> I started in a similar place as you, believing that applications would
> want to know about improperly encoded data.  I learned that many do not,
> and that most end-user applications simply want to be as lenient as possible
> to provide the best data possible to the user.