RFR: 8365017: The SegmentBulkOperations::copy method can be improved using overlaps [v7]

Wed Aug 13 12:32:23 UTC 2025

On Wed, 13 Aug 2025 10:08:33 GMT, Per Minborg <pminborg at openjdk.org> wrote:

>> This PR proposes to use overlapping memory areas in `SegmentBulkOperations::copy`, similar to what is proposed for `SegmentBulkOperations::fill` in https://github.com/openjdk/jdk/pull/25383.
>> 
>> This PR passes `tier1`, `tier2`, and `tier3`testing on multiple platforms.
>
> Per Minborg has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 12 additional commits since the last revision:
> 
>  - Revert comment
>  - Revert change in AMSI
>  - Back paddle on the branchless overlap method
>  - Merge branch 'master' into copy-overlap
>  - Fix comment
>  - Simplify
>  - Update copyright year
>  - Add a branchless overlaps method
>  - Optimize copy for certain lengths
>  - Add methods to ScopedMemoryAccess
>  - ... and 2 more: https://git.openjdk.org/jdk/compare/b0f898b5...1a1606e2

If I see this right, it could be that we duplicate the copy on a byte. If we only copy 2 bytes, then the copy on the second byte is repeated:

src[0] -> dst[0]
src[1] -> dst[1]
src[1] -> dst[1]

But is that ok?

Assume we have threads 1 and 2, both access memory locations A and B:

// Initial state
A = x;
B = y;

// Concurrently:
1: B = A;
2: A = B + 1;

What states could be observed at the end? We could look at all permutations of the 4 operations from threads 1 and 2. Let's call the operations 1.1 (1 loads A), 1.2 (1 stores B), 2.1 (2 loads B), 2.2 (2 stores A).
These are the 4! / 4 = 6 possible permutations, together with the results for A and B:

1.1 1.2 2.1 2.2 -> A = x+1; B = x
1.1 2.1 1.2 2.2 -> A = y+1; B = x
1.1 2.1 2.2 1.2 -> A = y+1; B = x
2.1 1.1 1.2 2.2 -> A = y+1; B = x
2.1 1.1 2.2 1.2 -> A = y+1; B = x
2.1 2.2 1.1 1.2 -> A = y+1; B = y+1

Now assume we repeat the copy for thread 1. Thus, this could happen:

1: B = A;
2: A = B + 1;
1: B = A; // repeated copy

And if it happens in this sequence, we get result:
`A = x+1; B = x+1`
But this result was not observable before.

That makes me wonder if repeating the copy is really allowed in the Java memory model?

src/java.base/share/classes/jdk/internal/foreign/SegmentBulkOperations.java line 156:

> 154:         // `putByte()` below is enough as 3 is the maximum number of bytes covered
> 155:         final byte b = SCOPED_MEMORY_ACCESS.getByte(srcSession, src.unsafeGetBase(), src.unsafeGetOffset() + srcOffset + len - Byte.BYTES);
> 156:         SCOPED_MEMORY_ACCESS.putByte(dstSession, dst.unsafeGetBase(), dst.unsafeGetOffset() + dstOffset + len - Byte.BYTES, b);

Are these duplications of operations really ok? Do you have a good argument for that?

-------------

PR Review: https://git.openjdk.org/jdk/pull/26672#pullrequestreview-3115657763
PR Review Comment: https://git.openjdk.org/jdk/pull/26672#discussion_r2273228594