RFR: 8357531: The `SegmentBulkOperations::fill` method can be improved using overlaps [v2]

Thu May 22 08:14:35 UTC 2025

On Thu, 22 May 2025 08:11:09 GMT, Per Minborg <pminborg at openjdk.org> wrote:

>> This PR builds on a concept John Rose told me about some time ago. Instead of combining memory operations of various sizes, a single large and skewed memory operation can be made to clean up the tail of remaining bytes.
>> 
>> This has the effect of simplifying and shortening the code. The number of branches to evaluate is reduced.
>
> Per Minborg has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Correct typo in comment

src/java.base/share/classes/jdk/internal/foreign/SegmentBulkOperations.java line 110:

> 108:                     SCOPED_MEMORY_ACCESS.setMemory(dst.sessionImpl(), dst.unsafeGetBase(), dst.unsafeGetOffset(), len, value);
> 109:                 }
> 110:             }

Suggestion:

        final var sessionImpl = dst.sessionImpl();
        final var unsafeGetBase = dst.unsafeGetBase();
        final var unsafeGetOffset = dst.unsafeGetOffset();
        final var bigEndian = !Architecture.isLittleEndian();

        // Switch on log2(len) = 64 - Long.numberOfLeadingZeros(len)
        switch (64 - Long.numberOfLeadingZeros(len)) {
            case 0 -> sessionImpl.checkValidState(); // Implicit state check
            case 1 -> SCOPED_MEMORY_ACCESS.putByte(sessionImpl, unsafeGetBase, unsafeGetOffset, value);
            case 2 -> {
                SCOPED_MEMORY_ACCESS.putShortUnaligned(sessionImpl, unsafeGetBase, unsafeGetOffset, (short) longValue, bigEndian);
                SCOPED_MEMORY_ACCESS.putShortUnaligned(sessionImpl, unsafeGetBase, unsafeGetOffset + len - Short.BYTES, (short) longValue, bigEndian);
            }
            case 3 -> {
                SCOPED_MEMORY_ACCESS.putIntUnaligned(sessionImpl, unsafeGetBase, unsafeGetOffset, (int) longValue, bigEndian);
                SCOPED_MEMORY_ACCESS.putIntUnaligned(sessionImpl, unsafeGetBase, unsafeGetOffset + len - Integer.BYTES, (int) longValue, bigEndian);
            }
            default -> {
                if (len < NATIVE_THRESHOLD_FILL) {
                    final int limit = (int) (len & (NATIVE_THRESHOLD_FILL - 8));
                    for (int offset = 0; offset < limit; offset += Long.BYTES) {
                        SCOPED_MEMORY_ACCESS.putLongUnaligned(sessionImpl, unsafeGetBase, unsafeGetOffset + offset, longValue, bigEndian);
                    }
                    SCOPED_MEMORY_ACCESS.putLongUnaligned(sessionImpl, unsafeGetBase, unsafeGetOffset + len - Long.BYTES, longValue, bigEndian);
                } else {
                    // Handle larger segments via native calls
                    SCOPED_MEMORY_ACCESS.setMemory(sessionImpl, unsafeGetBase, unsafeGetOffset, len, value);
                }
            }
        }

The current CodeSize is 370, which is greater than 325. It cannot be inlined during C2 optimization. We can extract the method calls used in each branch and declare them as local variables, which can reduce the CodeSize to 298.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/25383#discussion_r2101921745