RFR: 8338967: Improve performance for MemorySegment::fill [v10]
Maurizio Cimadamore
mcimadamore at openjdk.org
Mon Sep 2 09:39:21 UTC 2024
On Mon, 2 Sep 2024 08:56:47 GMT, Per Minborg <pminborg at openjdk.org> wrote:
>>> this can be u * 0xFFFFFFFFFFFFL if value != 0 and just 0L if not: not sure if fast(er), need to measure.
>>>
>>> Most of the time filling is happy with 0 since zeroing is the most common case
>>
>> It's a clever trick. However, I was looking at similar tricks and found that the time spent here is irrelevant (e.g. I tried to always force `0` as the value, and couldn't see any difference).
>
> If I run:
>
>
> @Benchmark
> public long shift() {
> return ELEM_SIZE << 56 | ELEM_SIZE << 48 | ELEM_SIZE << 40 | ELEM_SIZE << 32 | ELEM_SIZE << 24 | ELEM_SIZE << 16 | ELEM_SIZE << 8 | ELEM_SIZE;
> }
>
> @Benchmark
> public long mul() {
> return ELEM_SIZE * 0xFFFF_FFFF_FFFFL;
> }
>
> Then I get:
>
> Benchmark (ELEM_SIZE) Mode Cnt Score Error Units
> TestFill.mul 31 avgt 30 0.586 ? 0.045 ns/op
> TestFill.shift 31 avgt 30 0.938 ? 0.017 ns/op
>
> On my M1 machine.
I found similar small improvements to be had (I wrote about them offline) when replacing the bitwise-based tests (e.g. `foo & 4 != 0`) with a more explicit check for `remainingBytes >=4`. Seems like bitwise operations are not as optimized (or perhaps the assembly instructions for them is overall more convoluted - I haven't checked).
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/20712#discussion_r1740612559
More information about the core-libs-dev
mailing list