RFR 8025003: Base64 should be less strict with padding

Wed Nov 13 05:24:08 UTC 2013

On 11/12/13 8:21 PM, Bill Shannon wrote:
> Xueming Shen wrote on 11/12/2013 04:25 PM:
>> On 11/12/2013 03:32 PM, Bill Shannon wrote:
>>> This still seems like an inconsistent, and inconvenient, approach to me.
>>>
>>> You've decided that some encoding errors (i.e., missing pad characters)
>>> can be ignored.  You're willing to assume that the missing characters aren't
>>> missing data but just missing padding.  But if you find a padding character
>>> where you don't expect it you won't assume that the missing data is zero.
>> "missing pad characters" in theory is not an encoding errors. As the RFC
>> suggested, the
>> use of padding in base64 data is not required or used. They mainly serve the
>> purpose of
>> providing the indication of "end of the data". This is why the padding
>> character(s) is not
>> required (optional) by our decoder at first place. However, if the padding
>> character(s) is
>> present, they need to be correctly encoded, otherwise, it's a malformed base64
>> stream.
> I think we're interpreting the spec differently.
I meant to say "The RFC says the use of padding in base64 data is not 
required nor used, in some circumstances".
I interpret it as the padding is optional in some circumstances.

-Sherman
>
> If the padding characters are not needed, why define them at all?
> What advantage would there be in defining characters that convey no
> information?  Why not let the data just end wherever it ends, throwing
> away unused bits?
>
> The padding characters are required.  If they're missing, you have no
> idea if the encoder just left them out, or if the data was truncated
> or corrupted.
>
> I understand the desire to check that the data is encoded exactly the
> way the spec says it should be encoded, and to consider it an error
> otherwise.  This is the "strict" approach.  But that's not what you're
> doing.  You're deciding that you care about some kinds of errors but
> not all kinds of errors.  That's a judgment call that, as far as I can
> tell, is not based on real experience with encoded data.
>
>> To address your strong request fore more "lenient" MIME decoder, we have updated
>> the
>> spec and implementation to be a reasonable liberal for the incorrect padding at
>> the end
>> of the mime base64 data as showed below
>>
>>       xxxx =       unnecessary padding character at the end of encoded stream
>>       xxxx xx=     missing the last padding character
>>       xxxx xx=y    missing the last padding character, instead having a
>> non-padding char
>>
>> With the assumption that it still follows the "spirit" of the purpose of padding
>> character (as suggested by the RFC), to indicate the end of the data stream, no
>> more
>> decoding is needed beyond the padding character. Yes, it makes the MIME decoder
>> somewhat
>> "inconsistent" with our original design and the rest of other type of decoders,
>> but we
>> thought it might provide the "convenience" requested.
>>
>> But a single tangling byte at the end of the encoded data stream is obvious an
>> encoding
>> error or transportation error. As I said, I don't think the decoder should try
>> to rescue with
>> guess. The proposed change is to try to provide a simple mechanism that the
>> application
>> can do some lifecircle/error management to recovery from the malformed data
>> stream, if
>> desired. This is actually is NOT what j.u.Base64 is desired for. The primary
>> goal is to provide
>> a set of easy/simple utility methods for base64 encoding/decoding, not such
>> complicated
>> error recovery management, as the java.nio.charset.De/Encoder provides.
> There's really no error recovery possible, and certainly no program is
> going to attempt error recovery.  As I said, there's only two reasonable
> things to do:  1) throw up your hands, claim the data is corrupt, and tell
> the user there's nothing you can do, or 2) do your best job to give the user
> as much data as possible, and let the user decide if the data is in fact corrupt.
> I'd be happy for you to provide options to do both.  Doing something that's
> half way between the two just isn't useful.
>
>> The JavaDoc definitely can be improved to provide a detailed use case, sample,
>> if it
>> helps. But if it's definitely a no-go, maybe we can leave this for jdk9 for
>> bigger surgery.
> Without support for error-free decoding, there's little motivation for me
> to ever convert JavaMail to use this new capability.