RFR: 8331311: C2: Big Endian Port of 8318446: optimize stores into primitive arrays by combining values into larger store
Richard Reingruber
rrich at openjdk.org
Wed May 15 07:48:12 UTC 2024
On Mon, 13 May 2024 15:53:52 GMT, Richard Reingruber <rrich at openjdk.org> wrote:
> This pr adds a few tweaks to [JDK-8318446](https://bugs.openjdk.org/browse/JDK-8318446) which allows enabling it also on big endian platforms (e.g. AIX, S390). JDK-8318446 introduced a C2 optimization to replace consecutive stores to a primitive array with just one store.
>
> By example (from `TestMergeStores.java`):
>
>
> static Object[] test2a(byte[] a, int offset, long v) {
> if (IS_BIG_ENDIAN) {
> a[offset + 0] = (byte)(v >> 56);
> a[offset + 1] = (byte)(v >> 48);
> a[offset + 2] = (byte)(v >> 40);
> a[offset + 3] = (byte)(v >> 32);
> a[offset + 4] = (byte)(v >> 24);
> a[offset + 5] = (byte)(v >> 16);
> a[offset + 6] = (byte)(v >> 8);
> a[offset + 7] = (byte)(v >> 0);
> } else {
> a[offset + 0] = (byte)(v >> 0);
> a[offset + 1] = (byte)(v >> 8);
> a[offset + 2] = (byte)(v >> 16);
> a[offset + 3] = (byte)(v >> 24);
> a[offset + 4] = (byte)(v >> 32);
> a[offset + 5] = (byte)(v >> 40);
> a[offset + 6] = (byte)(v >> 48);
> a[offset + 7] = (byte)(v >> 56);
> }
> return new Object[]{ a };
> }
>
>
> Depending on the endianess 8 bytes are stored into an array. The order of the stores is the same as the order of an 8-byte-store therefore 8 1-byte-stores can be replaced with just one 8-byte-store (if there aren't too many range checks).
>
> Additionally I've fixed a few comments and a test bug.
>
> The optimization seems to be a little bit more effective on big endian platforms.
>
> Again by example:
>
>
> static Object[] test800a(byte[] a, int offset, long v) {
> if (IS_BIG_ENDIAN) {
> a[offset + 0] = (byte)(v >> 40); // Removed from candidate list
> a[offset + 1] = (byte)(v >> 32); // Removed from candidate list
> a[offset + 2] = (byte)(v >> 24); // Merged
> a[offset + 3] = (byte)(v >> 16); // Merged
> a[offset + 4] = (byte)(v >> 8); // Merged
> a[offset + 5] = (byte)(v >> 0); // Merged
> } else {
> a[offset + 0] = (byte)(v >> 0); // Removed from candidate list
> a[offset + 1] = (byte)(v >> 8); // Removed from candidate list
> a[offset + 2] = (byte)(v >> 16); // Not merged
> a[offset + 3] = (byte)(v >> 24); // Not merged
> a[offset + 4] = (byte)(v >> 32); // Not merged
> a[offset + 5] = (byte)(v >> 40); // Not merged
> }
> return new Object[]{ a };...
> It's not obvious to me why something like
>
> ```c++
> a[offset + 2] = (byte)(v >> 16); // Not merged
> a[offset + 3] = (byte)(v >> 24); // Not merged
> a[offset + 4] = (byte)(v >> 32); // Not merged
> a[offset + 5] = (byte)(v >> 40); // Not merged
> ```
>
> can't be merged.
The stores could be merged to the following pseudo code:
```c++
*(int*)&a[offset + 2] = (int)(v >> 16); // Merged
The current logic doesn't accept the right shift [here](https://github.com/openjdk/jdk/blob/c642f44bbe1e4cdbc23496a34ddaae30990ce7c0/src/hotspot/share/opto/memnode.cpp#L3302).
I think at that location we can always accept `merged_input_value` asserting that it is a right shift of `base_last` since the `is_adjacent_input_pair` checks succeeded before.
I haven't tried it though.
I'll clarify the synopsis of this pr and the comment in `TestMergeStores.java`.
-------------
PR Comment: https://git.openjdk.org/jdk/pull/19218#issuecomment-2111808731
More information about the hotspot-compiler-dev
mailing list