RFR: 8357531: The `SegmentBulkOperations::fill` method can be improved using overlaps [v5]

Thu May 22 16:49:52 UTC 2025

On Thu, 22 May 2025 11:52:34 GMT, Per Minborg <pminborg at openjdk.org> wrote:

>> This PR builds on a concept John Rose told me about some time ago. Instead of combining memory operations of various sizes, a single large and skewed memory operation can be made to clean up the tail of remaining bytes.
>> 
>> This has the effect of simplifying and shortening the code. The number of branches to evaluate is reduced.
>> 
>> It should be noted that the performance of the fill operation affects the allocation of new segments (as they are zeroed out before being returned to the client code).
>
> Per Minborg has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Update benchmark to reflect new fill method

Very cool!  I'm glad it worked.  This came out of some background work I was doing to find fast ways to feed a vectorized loop over an input measured in bytes (any number of them).

https://cr.openjdk.org/~jrose/jvm/PartialMemoryWord.cpp

The corresponding read technique works quite well, also.  It has the property that (if you combine the partial overlapping reads correctly) that each byte is read exactly once, which might be a good property for building concurrent data structures.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/25383#issuecomment-2901911070