RFR 8025003: Base64 should be less strict with padding
Xueming Shen
xueming.shen at oracle.com
Fri Oct 25 22:19:33 UTC 2013
On 10/25/13 2:19 PM, Bill Shannon wrote:
> If I understand this correctly, this proposes to remove the "lenient"
> option we've been discussing and just make it always lenient. Is that
> correct?
Yes. Only for the mime type though.
>
> Unfortunately, from what you say below, it's still not lenient enough.
> I'd really like a version that never, ever, for any reason, throws an
> exception. Yes, that means when you only get a final 6 bits of data
> you have to make an assumption about what was intended, probably padding
> it with zeros to 8 bits.
This is something I'm hesitated to do. I can be lenient for the padding
end because the
padding character itself is not the real "data", whether or not it's
present, it's missing or
it's incorrect/incomplete, it does not impact the integrity of the data.
But to feed the last
6 bits with zero, is really kinda of guessing, NOT decoding.
-Sherman
> Xueming Shen wrote on 10/23/13 15:42:
>> Hi,
>>
>> The current spec and implementation of Base64 decoder [1] is standard/rfc
>> based, in which it interprets/decodes the ending padding character(s) correctly
>> if present. The ending padding character(s) is not requested (liberal), but if
>> present, the spec and implementation requests they MUST be encoded correctly,
>> any incorrect padding combination at the final unit (as listed below) is treated
>> as incorrect encoded base64 data and results in exception.
>>
>> Patterns of possible incorrectly encoded padding final base64 unit are:
>>
>> xxxx = unnecessary padding character at the end of encoded stream
>> xxxx xx= missing the last padding character
>> xxxx xx=y missing the last padding character, instead having a
>> non-padding char
>>
>> The feedback we got so far suggests that "incorrectly encoded padding unit"
>> might might be frequently observed in real world use scenario, especially in the
>> MIME/email world, it might be desired to just accept these incorrectly encoded
>> ending unit and decode the rest successfully without throwing an exception.
>>
>> It is also suggested it might be more appropriate to rename
>> Base64.getEncoder(int lineLength, byte[] sept)
>> to be
>> Base64.getMimeEncoder(int, byte[]).
>>
>> The proposed changes here are to
>>
>> (1) rename the factory method for the customizable "mime" encoder to
>> Base.getMimeEncoder(int, byte[]);
>>
>> (2) change the spec/implementation for the "mime" decoder to be lenient when
>> handing the padding character in the final unit (mime decoder itself is
>> "lenient"
>> already. Its spec requests any non-base64 character during encoding. And our
>> existing decoder is liberal when there is no padding present at all)
>>
>> Here is the webrev
>>
>> http://cr.openjdk.java.net/~sherman/8025003/webrev/
>>
>> thanks!
>> -Sherman
>>
>> Btw, updated mime decoder stilll throws exception for pattern like "xxxx x=..." or
>> "xxxx x", in which the last unit only has one valid "byte"/6-bit data.
>> It's not sufficient to be decoded into a valid 8-bit/byte data.
>>
More information about the core-libs-dev
mailing list