java.util.Base64 accepts non-canonical encodings

Tue Jun 23 15:41:01 UTC 2020

OK, I'll avoid the checks in the patch intended for general publication 
but will add them separately just to see which of current tests would fail.

I'll then later open a separate issue for discussing a possible 
extension of the API and a possible CSR.

Greetings
Raffaello

On 2020-06-23 17:14, Roger Riggs wrote:
> Hi,
> 
> This is a case where having some more interoperability testing could be 
> informative
> though there are likely many adhoc Base64 encoders and its not practical 
> to test
> against them.
> 
> Introducing a new mode or option creates an undesirable fuzzyness to the 
> API.
> 
> It won't help existing uses without some deliberate attempt to move 
> clients to use
> the new option (deprecation). Its not likely to be picked up by new 
> clients because
> the difference is behavior is slight and may not be seen as important.
> An options also tend to live forever, increasing maintenace on both the 
> library and
> the callers.
> 
> Its worth creating a separate issue and looking at it separately.
> 
> Thanks, Roger
> 
> 
> On 6/23/20 10:50 AM, Raffaello Giulietti wrote:
>> Hi Roger,
>>
>> I didn't yet implement the strict check since, as you point out, it 
>> could harm existing code in the wild, even if the OpenJDK test would 
>> all pass.
>>
>> That's why I'm wondering if it would make sense to extend the existing 
>> API to have the check as an additional option.
>>
>>
>> Greetings
>> Raffaello
>>
>>
>>> Hi Raffaello,
>>>
>>> I think the concern over accepting non-canonical encodings would be 
>>> compatibility.
>>> It would rude to implement the strictness and have applications start 
>>> failing.
>>> But it is likely an oversight since existing code checks for other 
>>> invalid encodings.
>>>
>>> Do any of the existing tests fail if the non-canonical encoding throws?
>>>
>>> Thanks, Roger
>>>
>>>
>>> On 6/23/20 9:00 AM, Raffaello Giulietti wrote:
>>>> Hi,
>>>>
>>>> RFC 4648, in section "3.5. Canonical Encoding", prescribes that pad 
>>>> bits must be set to zero.
>>>>
>>>> However, the current decoder implementation in java.util.Base64 
>>>> accepts non-canonical encodings as well. For example, all of the 
>>>> following four encodings
>>>> KCk=
>>>> KCl=
>>>> KCm=
>>>> KCn=
>>>> where only the first is canonical, decode to the sequence of two 
>>>> bytes 0x28 0x29. Padding positions could act as a (very low 
>>>> bandwidth) covert channel.
>>>>
>>>> Since I'm preparing a patch for [1] (see [2]), I'm asking if this is 
>>>> intentional behavior or if it is an oversight. Of course, checking 
>>>> for strictness would slightly impact performance.
>>>>
>>>> If checking for non-zero padding bits is desired, should the API be 
>>>> extended to allow for both the strict and the (current) lenient 
>>>> behaviors? Would the current API suffice?
>>>>
>>>>
>>>> Greetings
>>>> Raffaello
>>>>
>>>> ----
>>>>
>>>> [1] https://bugs.openjdk.java.net/browse/JDK-8222187
>>>> [2] 
>>>> https://mail.openjdk.java.net/pipermail/core-libs-dev/2020-June/067066.html 
>>>>
>