RFR: 8331311: C2: Big Endian Port of 8318446: optimize stores into primitive arrays by combining values into larger store [v4]

Emanuel Peter epeter at openjdk.org
Fri May 24 08:29:04 UTC 2024


On Thu, 16 May 2024 01:39:26 GMT, Richard Reingruber <rrich at openjdk.org> wrote:

>> This pr adds a few tweaks to [JDK-8318446](https://bugs.openjdk.org/browse/JDK-8318446) which allows enabling it also on big endian platforms (e.g. AIX, S390). JDK-8318446 introduced a C2 optimization to replace consecutive stores to a primitive array with just one store.
>> 
>> By example (from `TestMergeStores.java`):
>> 
>> 
>>     static Object[] test2a(byte[] a, int offset, long v) {
>>         if (IS_BIG_ENDIAN) {
>>             a[offset + 0] = (byte)(v >> 56);
>>             a[offset + 1] = (byte)(v >> 48);
>>             a[offset + 2] = (byte)(v >> 40);
>>             a[offset + 3] = (byte)(v >> 32);
>>             a[offset + 4] = (byte)(v >> 24);
>>             a[offset + 5] = (byte)(v >> 16);
>>             a[offset + 6] = (byte)(v >> 8);
>>             a[offset + 7] = (byte)(v >> 0);
>>         } else {
>>             a[offset + 0] = (byte)(v >> 0);
>>             a[offset + 1] = (byte)(v >> 8);
>>             a[offset + 2] = (byte)(v >> 16);
>>             a[offset + 3] = (byte)(v >> 24);
>>             a[offset + 4] = (byte)(v >> 32);
>>             a[offset + 5] = (byte)(v >> 40);
>>             a[offset + 6] = (byte)(v >> 48);
>>             a[offset + 7] = (byte)(v >> 56);
>>         }
>>         return new Object[]{ a };
>>     }
>> 
>> 
>> Depending on the endianess 8 bytes are stored into an array. The order of the stores is the same as the order of an 8-byte-store therefore 8 1-byte-stores can be replaced with just one 8-byte-store (if there aren't too many range checks).
>> 
>> Additionally I've fixed a few comments and a test bug.
>> 
>> The optimization seems to be a little bit more effective on big endian platforms.
>> 
>> Again by example:
>> 
>> 
>>     static Object[] test800a(byte[] a, int offset, long v) {
>>         if (IS_BIG_ENDIAN) {
>>             a[offset + 0] = (byte)(v >> 40); // Removed from candidate list
>>             a[offset + 1] = (byte)(v >> 32); // Removed from candidate list
>>             a[offset + 2] = (byte)(v >> 24); // Merged
>>             a[offset + 3] = (byte)(v >> 16); // Merged
>>             a[offset + 4] = (byte)(v >> 8);  // Merged
>>             a[offset + 5] = (byte)(v >> 0);  // Merged
>>         } else {
>>             a[offset + 0] = (byte)(v >> 0);  // Removed from candidate list
>>             a[offset + 1] = (byte)(v >> 8);  // Removed from candidate list
>>             a[offset + 2] = (byte)(v >> 16); // Not merged
>>             a[offset + 3] = (byte)(v >> 24); // Not merged
>>             a[offset + 4] = (byte)(v >> 32); // Not merge...
>
> Richard Reingruber has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Eliminate IS_BIG_ENDIAN and always execute both variants

I'm running testing again, but the code looks good now!

I just had another idea:
Could we use some sort of "byte reverse / shuffle" operation to do these use cases for both big/little-endian?


        storeBytes(bytes, offset, (byte)(value >> 8),
                                  (byte)(value >> 0));

        storeBytes(bytes, offset, (byte)(value >> 0),
                                  (byte)(value >> 8));


Not sure if that would be profitable or even available on all platforms. Could be a future RFE someone can work on after this. What do you think? It might make performance more predictable across platforms.

src/hotspot/share/opto/memnode.cpp line 3310:

> 3308:     Node* hi = first->in(MemNode::ValueIn);
> 3309:     Node* lo = _store->in(MemNode::ValueIn);
> 3310: #endif // VM_LITTLE_ENDIAN

A `swap` could be more concise. But I leave that up to you ;)

-------------

Marked as reviewed by epeter (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/19218#pullrequestreview-2076196539
PR Review Comment: https://git.openjdk.org/jdk/pull/19218#discussion_r1613067944


More information about the hotspot-compiler-dev mailing list