RFR 8025003: Base64 should be less strict with padding

Wed Nov 13 04:21:28 UTC 2013

Xueming Shen wrote on 11/12/2013 04:25 PM:
> On 11/12/2013 03:32 PM, Bill Shannon wrote:
>> This still seems like an inconsistent, and inconvenient, approach to me.
>>
>> You've decided that some encoding errors (i.e., missing pad characters)
>> can be ignored.  You're willing to assume that the missing characters aren't
>> missing data but just missing padding.  But if you find a padding character
>> where you don't expect it you won't assume that the missing data is zero.
> 
> "missing pad characters" in theory is not an encoding errors. As the RFC
> suggested, the
> use of padding in base64 data is not required or used. They mainly serve the
> purpose of
> providing the indication of "end of the data". This is why the padding
> character(s) is not
> required (optional) by our decoder at first place. However, if the padding
> character(s) is
> present, they need to be correctly encoded, otherwise, it's a malformed base64
> stream.

I think we're interpreting the spec differently.

If the padding characters are not needed, why define them at all?
What advantage would there be in defining characters that convey no
information?  Why not let the data just end wherever it ends, throwing
away unused bits?

The padding characters are required.  If they're missing, you have no
idea if the encoder just left them out, or if the data was truncated
or corrupted.

I understand the desire to check that the data is encoded exactly the
way the spec says it should be encoded, and to consider it an error
otherwise.  This is the "strict" approach.  But that's not what you're
doing.  You're deciding that you care about some kinds of errors but
not all kinds of errors.  That's a judgment call that, as far as I can
tell, is not based on real experience with encoded data.

> To address your strong request fore more "lenient" MIME decoder, we have updated
> the
> spec and implementation to be a reasonable liberal for the incorrect padding at
> the end
> of the mime base64 data as showed below
> 
>      xxxx =       unnecessary padding character at the end of encoded stream
>      xxxx xx=     missing the last padding character
>      xxxx xx=y    missing the last padding character, instead having a
> non-padding char
> 
> With the assumption that it still follows the "spirit" of the purpose of padding
> character (as suggested by the RFC), to indicate the end of the data stream, no
> more
> decoding is needed beyond the padding character. Yes, it makes the MIME decoder
> somewhat
> "inconsistent" with our original design and the rest of other type of decoders,
> but we
> thought it might provide the "convenience" requested.
> 
> But a single tangling byte at the end of the encoded data stream is obvious an
> encoding
> error or transportation error. As I said, I don't think the decoder should try
> to rescue with
> guess. The proposed change is to try to provide a simple mechanism that the
> application
> can do some lifecircle/error management to recovery from the malformed data
> stream, if
> desired. This is actually is NOT what j.u.Base64 is desired for. The primary
> goal is to provide
> a set of easy/simple utility methods for base64 encoding/decoding, not such
> complicated
> error recovery management, as the java.nio.charset.De/Encoder provides.

There's really no error recovery possible, and certainly no program is
going to attempt error recovery.  As I said, there's only two reasonable
things to do:  1) throw up your hands, claim the data is corrupt, and tell
the user there's nothing you can do, or 2) do your best job to give the user
as much data as possible, and let the user decide if the data is in fact corrupt.
I'd be happy for you to provide options to do both.  Doing something that's
half way between the two just isn't useful.

> The JavaDoc definitely can be improved to provide a detailed use case, sample,
> if it
> helps. But if it's definitely a no-go, maybe we can leave this for jdk9 for
> bigger surgery.

Without support for error-free decoding, there's little motivation for me
to ever convert JavaMail to use this new capability.