RFR 8025003: Base64 should be less strict with padding

Fri Oct 25 22:19:33 UTC 2013

On 10/25/13 2:19 PM, Bill Shannon wrote:
> If I understand this correctly, this proposes to remove the "lenient"
> option we've been discussing and just make it always lenient.  Is that
> correct?

Yes. Only for the mime type though.

>
> Unfortunately, from what you say below, it's still not lenient enough.
> I'd really like a version that never, ever, for any reason, throws an
> exception.  Yes, that means when you only get a final 6 bits of data
> you have to make an assumption about what was intended, probably padding
> it with zeros to 8 bits.

This is something I'm hesitated to do. I can be lenient for the padding 
end because the
padding character itself is not the real "data", whether or not it's 
present, it's missing or
it's incorrect/incomplete, it does not impact the integrity of the data. 
But to feed the last
6 bits with zero, is really kinda of guessing, NOT decoding.

-Sherman

> Xueming Shen wrote on 10/23/13 15:42:
>> Hi,
>>
>> The current spec and implementation of Base64 decoder [1] is standard/rfc
>> based, in which it interprets/decodes the ending padding character(s) correctly
>> if present. The ending padding character(s) is not requested (liberal), but if
>> present, the spec and implementation requests they MUST be encoded correctly,
>> any incorrect padding combination at the final unit (as listed below) is treated
>> as incorrect encoded base64 data and results in exception.
>>
>> Patterns of possible incorrectly encoded padding final base64 unit are:
>>
>>      xxxx =       unnecessary padding character at the end of encoded stream
>>      xxxx xx=     missing the last padding character
>>      xxxx xx=y    missing the last padding character, instead having a
>> non-padding char
>>
>> The feedback we got so far suggests that "incorrectly encoded padding unit"
>> might might be frequently observed in real world use scenario, especially in the
>> MIME/email world, it might be desired to just accept these incorrectly encoded
>> ending unit and decode the rest successfully without throwing an exception.
>>
>> It is also suggested it might be more appropriate to rename
>>      Base64.getEncoder(int lineLength, byte[] sept)
>> to be
>>      Base64.getMimeEncoder(int, byte[]).
>>
>> The proposed changes here are to
>>
>> (1) rename the factory method for the customizable "mime" encoder to
>>      Base.getMimeEncoder(int, byte[]);
>>
>> (2) change the spec/implementation for the "mime" decoder to be lenient when
>>      handing the padding character in the final unit (mime decoder itself is
>> "lenient"
>>      already. Its spec requests any non-base64 character during encoding. And our
>>      existing decoder is liberal when there is no padding present at all)
>>
>> Here is the webrev
>>
>> http://cr.openjdk.java.net/~sherman/8025003/webrev/
>>
>> thanks!
>> -Sherman
>>
>> Btw, updated mime decoder stilll throws exception for pattern like "xxxx x=..." or
>> "xxxx x", in which the last unit only has one valid "byte"/6-bit data.
>> It's not sufficient to be decoded into a valid 8-bit/byte data.
>>