RFR: 8331311: C2: Big Endian Port of 8318446: optimize stores into primitive arrays by combining values into larger store

Tue May 14 13:02:11 UTC 2024

This pr adds a few tweaks to [JDK-8318446](https://bugs.openjdk.org/browse/JDK-8318446) which allows enabling it also on big endian platforms (e.g. AIX, S390). JDK-8318446 introduced a C2 optimization to replace consecutive stores to a primitive array with just one store.

By example (from `TestMergeStores.java`):

    static Object[] test2a(byte[] a, int offset, long v) {
        if (IS_BIG_ENDIAN) {
            a[offset + 0] = (byte)(v >> 56);
            a[offset + 1] = (byte)(v >> 48);
            a[offset + 2] = (byte)(v >> 40);
            a[offset + 3] = (byte)(v >> 32);
            a[offset + 4] = (byte)(v >> 24);
            a[offset + 5] = (byte)(v >> 16);
            a[offset + 6] = (byte)(v >> 8);
            a[offset + 7] = (byte)(v >> 0);
        } else {
            a[offset + 0] = (byte)(v >> 0);
            a[offset + 1] = (byte)(v >> 8);
            a[offset + 2] = (byte)(v >> 16);
            a[offset + 3] = (byte)(v >> 24);
            a[offset + 4] = (byte)(v >> 32);
            a[offset + 5] = (byte)(v >> 40);
            a[offset + 6] = (byte)(v >> 48);
            a[offset + 7] = (byte)(v >> 56);
        }
        return new Object[]{ a };
    }

Depending on the endianess 8 bytes are stored into an array. The order of the stores is the same as the order of an 8-byte-store therefore 8 1-byte-stores can be replaced with just one 8-byte-store (if there aren't too many range checks).

Additionally I've fixed a few comments and a test bug.

The optimization seems to be a little bit more effective on big endian platforms.

Again by example:

    static Object[] test800a(byte[] a, int offset, long v) {
        if (IS_BIG_ENDIAN) {
            a[offset + 0] = (byte)(v >> 40); // Removed from candidate list
            a[offset + 1] = (byte)(v >> 32); // Removed from candidate list
            a[offset + 2] = (byte)(v >> 24); // Merged
            a[offset + 3] = (byte)(v >> 16); // Merged
            a[offset + 4] = (byte)(v >> 8);  // Merged
            a[offset + 5] = (byte)(v >> 0);  // Merged
        } else {
            a[offset + 0] = (byte)(v >> 0);  // Removed from candidate list
            a[offset + 1] = (byte)(v >> 8);  // Removed from candidate list
            a[offset + 2] = (byte)(v >> 16); // Not merged
            a[offset + 3] = (byte)(v >> 24); // Not merged
            a[offset + 4] = (byte)(v >> 32); // Not merged
            a[offset + 5] = (byte)(v >> 40); // Not merged
        }
        return new Object[]{ a };
    }

The sequence of candidate stores begins at the lowest store (in Memory def-use order) and is trimmed to a power of 2 removing higher stores if necessary. On little endian platforms this removes the least significant bytes to be stored. Therefore the remaining stores cannot be merged since this would require a right shift. On big endian platforms the stores of the more significant bytes are removed and the remaining stores can be merged.

I introduced new platform attributes `little-endian`, `big-endian` to the IR testing framework to be able to adapt IR matching rules to this difference.

Testing:

`TestMergeStores.java` on AIX and S390.

JTReg tests: tier1-4 of hotspot and jdk. All of Langtools and jaxp. JCK, SPECjvm2008, SPECjbb2015, Renaissance Suite, and SAP specific tests.
Testing was done with fastdebug builds on the main platforms and also on Linux/PPC64le and AIX.

-------------

Commit messages:
 - Improve comment
 - Add bug id
 - Typo
 - 8331311: C2: Big Endian Port of 8318446: optimize stores into primitive arrays by combining values into larger store

Changes: https://git.openjdk.org/jdk/pull/19218/files
  Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=19218&range=00
  Issue: https://bugs.openjdk.org/browse/JDK-8331311
  Stats: 572 lines in 3 files changed: 378 ins; 3 del; 191 mod
  Patch: https://git.openjdk.org/jdk/pull/19218.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/19218/head:pull/19218

PR: https://git.openjdk.org/jdk/pull/19218