RFR: 8338967: Improve performance for MemorySegment::fill [v5]
Per Minborg
pminborg at openjdk.org
Fri Aug 30 12:18:20 UTC 2024
On Wed, 28 Aug 2024 15:32:40 GMT, Francesco Nigro <duke at openjdk.org> wrote:
>>> How fast do we need to be here given we are measuring in a few nanoseconds per operation?
>>>
>>> What if the goal is not to regress from say explicitly filling in a small sized segment or a comparable array (e.g., < 8 bytes) then maybe a loop suffices and the code is simple?
>>
>> Fair question. I have another version (called "patch bits" below) that is based on bit logic (first doing int ops, then short and lastly byte, similar to `ArraySupport::vectorizedMismatch`). This has slightly worse performance but is more scalable and perhaps simpler.
>>
>> ![image](https://github.com/user-attachments/assets/292c75aa-0df8-4bb7-b45f-426d0f8470d9)
>
> @minborg Hi! I didn't checked the numbers with the benchmark I've written at https://github.com/openjdk/jdk/pull/20712#discussion_r1732802685 which is meant to stress the branch predictor (without enough `samples` i.e. past 128K on my machine) - can you give it a shot with M1 🙏 ?
@franz1981 Here is what I get if I run your performance test on my M1 Mac (unfortunately no -perf data):
Benchmark (samples) (shuffle) Mode Cnt Score Error Units
TestBranchFill.heap_segment_fill 1024 false avgt 30 3695.815 ? 24.615 ns/op
TestBranchFill.heap_segment_fill 1024 true avgt 30 3938.582 ? 124.510 ns/op
TestBranchFill.heap_segment_fill 128000 false avgt 30 420845.301 ? 1605.080 ns/op
TestBranchFill.heap_segment_fill 128000 true avgt 30 1778362.506 ? 39250.756 ns/op
-------------
PR Comment: https://git.openjdk.org/jdk/pull/20712#issuecomment-2321048180
More information about the core-libs-dev
mailing list