RFR: JDK-8321599 Data loss in AVX3 Base64 decoding

Sat Dec 9 00:50:16 UTC 2023

On Sat, 9 Dec 2023 00:30:38 GMT, James Petty <duke at openjdk.org> wrote:

>> Fix for looking for padding characters within the encoded string.  Was not adding start offset to length, so was looking at potentially freed or uninitialized memory.
>> 
>> Tested teir1 and with testcase supplied with JBS issue.
>> 
>> The problem will only occur when all of the following are true:
>> 1. The source offset of the string to be decoded is != 0.
>> 2. The characters at the beginning of the string (minus the offset) plus the string length mod 64 are either "=" or "==".
>> 3. The string is >= 32 characters.
>> 4. The string is not MIME encoded.
>> 
>> If any of these conditions are not met, the decode works as expected. This was due to omitting the source offset of the string when checking for padding characters.
>
> I was one of the engineers investigating the issue and wrote the original form of reproducer submitted on the ticket. The use case that was failing that I tried to mimic in the reproducer was decoding base64 data in a column oriented analytics engine- so the fact that the backing buffer is (much) larger than the subset being decoded on any invocation, has non zero starting offsets, and contains padded base64 strings earlier in the source buffer isn’t an exceptional scenario given that use case. Thanks again for the quick fix!

> @pettyjamesm did you verified this fix with your case?

Unfortunately not, we don’t currently have any workflow that builds a test artifact with a JDK/JVM built from source- so it would be a big lift to get to that point for us.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/17039#issuecomment-1848016975