RFR: 8331311: C2: Big Endian Port of 8318446: optimize stores into primitive arrays by combining values into larger store [v2]

Wed May 15 14:12:17 UTC 2024

On Wed, 15 May 2024 07:53:33 GMT, Richard Reingruber <rrich at openjdk.org> wrote:

>> This pr adds a few tweaks to [JDK-8318446](https://bugs.openjdk.org/browse/JDK-8318446) which allows enabling it also on big endian platforms (e.g. AIX, S390). JDK-8318446 introduced a C2 optimization to replace consecutive stores to a primitive array with just one store.
>> 
>> By example (from `TestMergeStores.java`):
>> 
>> 
>>     static Object[] test2a(byte[] a, int offset, long v) {
>>         if (IS_BIG_ENDIAN) {
>>             a[offset + 0] = (byte)(v >> 56);
>>             a[offset + 1] = (byte)(v >> 48);
>>             a[offset + 2] = (byte)(v >> 40);
>>             a[offset + 3] = (byte)(v >> 32);
>>             a[offset + 4] = (byte)(v >> 24);
>>             a[offset + 5] = (byte)(v >> 16);
>>             a[offset + 6] = (byte)(v >> 8);
>>             a[offset + 7] = (byte)(v >> 0);
>>         } else {
>>             a[offset + 0] = (byte)(v >> 0);
>>             a[offset + 1] = (byte)(v >> 8);
>>             a[offset + 2] = (byte)(v >> 16);
>>             a[offset + 3] = (byte)(v >> 24);
>>             a[offset + 4] = (byte)(v >> 32);
>>             a[offset + 5] = (byte)(v >> 40);
>>             a[offset + 6] = (byte)(v >> 48);
>>             a[offset + 7] = (byte)(v >> 56);
>>         }
>>         return new Object[]{ a };
>>     }
>> 
>> 
>> Depending on the endianess 8 bytes are stored into an array. The order of the stores is the same as the order of an 8-byte-store therefore 8 1-byte-stores can be replaced with just one 8-byte-store (if there aren't too many range checks).
>> 
>> Additionally I've fixed a few comments and a test bug.
>> 
>> The optimization seems to be a little bit more effective on big endian platforms.
>> 
>> Again by example:
>> 
>> 
>>     static Object[] test800a(byte[] a, int offset, long v) {
>>         if (IS_BIG_ENDIAN) {
>>             a[offset + 0] = (byte)(v >> 40); // Removed from candidate list
>>             a[offset + 1] = (byte)(v >> 32); // Removed from candidate list
>>             a[offset + 2] = (byte)(v >> 24); // Merged
>>             a[offset + 3] = (byte)(v >> 16); // Merged
>>             a[offset + 4] = (byte)(v >> 8);  // Merged
>>             a[offset + 5] = (byte)(v >> 0);  // Merged
>>         } else {
>>             a[offset + 0] = (byte)(v >> 0);  // Removed from candidate list
>>             a[offset + 1] = (byte)(v >> 8);  // Removed from candidate list
>>             a[offset + 2] = (byte)(v >> 16); // Not merged
>>             a[offset + 3] = (byte)(v >> 24); // Not merged
>>             a[offset + 4] = (byte)(v >> 32); // Not merge...
>
> Richard Reingruber has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Improve comment

Thanks for looking at the pr.

> Just did a quick scan of the tests. I think it could be good to have both big/small endian tests run on both big/small endian machines, but only expect IR rules to pass if the test and platform are expected to optimize. This just makes sure that the logic is correct, and does not optimize the wrong cases, producing wrong results.

I've done that for `test2` and introduced `test2BE`. Is that want you mean?

-------------

PR Comment: https://git.openjdk.org/jdk/pull/19218#issuecomment-2112655406