RFR: JDK-8321599 Data loss in AVX3 Base64 decoding
Scott Gibbons
sgibbons at openjdk.org
Fri Dec 8 23:14:11 UTC 2023
On Fri, 8 Dec 2023 22:51:29 GMT, Evgeny Astigeevich <eastigeevich at openjdk.org> wrote:
> @asgibbons, am I correct the problem is that padding '=' characters were not found and not processed. This happens because a source offset is not taken into account. A test is:
>
> ```
> A, B: String
> Buf: ByteBuffer
> C := base64_encode(A) + base64_encode(B) # encode(B) should have '=' or '=='
> put C in Buf
> A' := base64_decode(Buf)
> B' := base64_decode(Buf)
> assert(A.equals(A'))
> assert(B.equals(B'))
> ```
No. The padding '=' character was found and terminated the decoding, which is expected. The issue is that the input string (encoded) is quite long in this case and the test is decoding a substring of the full string. The parameters passed to Decode are a pointer to the start of the (long) string and a (large) offset. I was looking for padding characters relative to the start of the long string instead of the substring (start plus the starting offset). Example:
Encoded string:
. . . = = . . . a a a a a a a ... a a a a
^ ^
| |
start start + offset
I was asked to decode the bytes at ```(start + offset)```. When the algorithm gets to the last 31 bytes of ```a a a a ... a a a a```, it looks for padding at ```(start + remaining_length - 1)``` instead of ```(start + start_offset + remaining_length - 1)```. It actually found a padding byte at ```(start + remaining_length - 1)``` and decided that the output length should be reduced by one character (or 2 if there were 2 padding bytes found). A very specific edge case (so good catch by testers).
-------------
PR Comment: https://git.openjdk.org/jdk/pull/17039#issuecomment-1847958191
More information about the hotspot-compiler-dev
mailing list