RFR 8025003: Base64 should be less strict with padding

Wed Nov 13 16:28:20 UTC 2013

On 11/12/13 11:44 PM, Bill Shannon wrote:
> Xueming Shen wrote on 11/12/2013 09:24 PM:
>> On 11/12/13 8:21 PM, Bill Shannon wrote:
>>> Xueming Shen wrote on 11/12/2013 04:25 PM:
>>>> On 11/12/2013 03:32 PM, Bill Shannon wrote:
>>>>> This still seems like an inconsistent, and inconvenient, approach to me.
>>>>>
>>>>> You've decided that some encoding errors (i.e., missing pad characters)
>>>>> can be ignored.  You're willing to assume that the missing characters aren't
>>>>> missing data but just missing padding.  But if you find a padding character
>>>>> where you don't expect it you won't assume that the missing data is zero.
>>>> "missing pad characters" in theory is not an encoding errors. As the RFC
>>>> suggested, the
>>>> use of padding in base64 data is not required or used. They mainly serve the
>>>> purpose of
>>>> providing the indication of "end of the data". This is why the padding
>>>> character(s) is not
>>>> required (optional) by our decoder at first place. However, if the padding
>>>> character(s) is
>>>> present, they need to be correctly encoded, otherwise, it's a malformed base64
>>>> stream.
>>> I think we're interpreting the spec differently.
>> I meant to say "The RFC says the use of padding in base64 data is not required
>> nor used, in some circumstances".
>> I interpret it as the padding is optional in some circumstances.
> It's never optional.  There's two specific cases in which it's required
> and one specific case in which it is not present.

My apology, It appears we are not talking about the same thing. What I'm 
trying to say is
that whether or not to USE the padding characters "="  is optional for 
base encoding "FOR
SOME CIRCUMSTANCES".  Maybe it's more clear to just cite the original 
wording here

    In some circumstances, the use of padding ("=") in base encoded data
    is not required nor used.  In the general case, when assumptions on
    size of transported data cannot be made, padding is required to yield
    correct decoded data.

    Implementations MUST include appropriate pad characters at the end of
    encoded data unless the specification referring to this document
    explicitly states otherwise.

My interpretation is that it is possible for some types/styles of Base64 
implementation
it is optional to not generate the "padding" character at the end of the 
encoding operation.
Though the RFC requires if it does omitting the padding character, it 
need to explicitly
specify this in its spec.

When encoding the existing implementation, by default, always add the 
padding characters
at the end of the encoded stream, if needed (for xx==, xxx=). Decoder is 
try to be "liberal"/
lenient in what your accept (with the assumption is that the encoded may 
come from some
encoder that not generate the padding characters), so it accept data 
with padding and
dta without padding. However, it requires that if padding characters are 
used, it need
to be CORRECTLY encoded. That was the original specification and 
implementation.
Upon your original request, I made the compromise to give MIME type a 
more liberal
spec/implementation for "incorrect" padding character combination as 
showed below

Patterns of possible incorrectly encoded padding final base64 unit are:

     xxxx =       unnecessary padding character at the end of encoded stream
     xxxx xx=     missing the last padding character
     xxxx xx=y    missing the last padding character, instead having a non-padding char

Now it appears this compromise became part of your complain.

Our difference is that I believe the "padding character" is not part of 
the original
data, we can be "liberal"/lenient for that. But "x===" (or simply a 
dangling "x")
is missing part of the original data for decoding, I'm concerned about to be
liberal on guessing what is missed.

-Sherman