RFR: 8315970: Big-endian issues after JDK-8310929

Sun Sep 10 23:54:39 UTC 2023

On Sun, 10 Sep 2023 22:27:48 GMT, Claes Redestad <redestad at openjdk.org> wrote:

>> https://bugs.openjdk.org/browse/JDK-8310929
>> 
>> @TheRealMDoerr Feedback:
>> 
>> 
>> We're getting test failures on AIX:
>> compiler/intrinsics/Test8215792.java
>> compiler/intrinsics/string/TestStringIntrinsics.java
>> runtime/CompactStrings/TestMethodNames.java
>> runtime/StringIntrinsic/StringIndexOfChar.java
>> Is there a problem with Big Endian?
>
> src/java.base/share/classes/java/lang/StringUTF16.java line 1632:
> 
>> 1630:     private static int inflatePacked(int v) {
>> 1631:         int packed = (int) StringLatin1.PACKED_DIGITS[v];
>> 1632:         return ((packed & 0xFF) << HI_BYTE_SHIFT)
> 
> I'm not sure this is correct.
> 
> Compare `StringUTF16::putChar` where these constants are used to shift _right_ to extract the equivalent byte from a value:
> 
>         val[index++] = (byte)(c >> HI_BYTE_SHIFT);
>         val[index]   = (byte)(c >> LO_BYTE_SHIFT);
> 
> I.e., when inflating a `byte` `0xaa` to a `char` encoded into a `byte[]` we end up with `0xaa00` on big-endian. Inflating a `short` literal `0xaabb` encoding two chars logically I think will need to consider each byte in isolation, ending up with `0xaa00bb00` (in little-endian notation). Or maybe it's `0xbb00aa00`. Ugh.. 
> 
> Since `HI_BYTE_SHIFT` is 8 on big-endian and 0 on little-endian I guess this might just work:
> 
> ```return ((packed & 0xFF) << 16 + HI_BYTE_SHIFT) | ((packed & 0xFF00) << HI_BYTE_SHIFT)```
> 
> .. but we really need to re-examine, prototype and test this out thoroughly on a big-endian system. I second @RogerRiggs notion that the best course of action right now is to back out #14699 and redo it with big-endianness issues resolved.

I'm also not sure if this PR is correct.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/15652#discussion_r1320889161