RFR 8025003: Base64 should be less strict with padding

Wed Nov 13 18:41:25 UTC 2013

Alan Bateman wrote on 11/13/13 08:51:
> On 13/11/2013 04:21, Bill Shannon wrote:
>> :
>> There's really no error recovery possible, and certainly no program is
>> going to attempt error recovery.  As I said, there's only two reasonable
>> things to do:  1) throw up your hands, claim the data is corrupt, and tell
>> the user there's nothing you can do, or 2) do your best job to give the user
>> as much data as possible, and let the user decide if the data is in fact corrupt.
>> I'd be happy for you to provide options to do both.  Doing something that's
>> half way between the two just isn't useful.
>>
> A variation of the second might be for the decoder to stop when it encounters an
> error (illegal character, missing pad characters, insufficient bits in the last
> unit). That is, give the user the bytes that have been successfully decoded plus
> an indication that the remaining bytes have not been processed due to an error.
> This wouldn't be too hard to add with changes or variations of the decode
> methods that decode buffers (as the source buffer can report if it has
> remaining/unprocessed bytes). It's just a bit more work for users of the API
> that want to be able to handle corrupt or truncated input. From a quick look
> then it actually isn't too far from where Sherman was going, at least it
> wouldn't be if missing pad characters were treated as an error (as they were
> previously).

I think always taking a strict approach is fine as long as it's straightforward
for a user of the API to turn that into a lenient decoding.  But if that's the
approach you prefer, you should detect more errors than currently proposed.

> The other thought is the charset API where a charset decoder can be configured
> to ignore, replace or report then malformed or unmappable input. Having support
> for all these actions is important for charset encoding/decoding but seems way
> too much for Base64 where I think the API should be simple for the majority of
> usages.

We started this with a request for a strict/lenient option.  That may still be
simpler than figuring out how to do strict decoding and report the error in a
way that users of the API can ignore the error and provide as much data as
possible.

> In any case, it's not clear what we can do this late in the schedule. It might
> be prudent to just fix the MIME decoder to throw IAE consistently and re-visit
> the API support for a lenient decoder in JDK 9.

When we started this conversation there was plenty of time to fix this.  :-(