RFR: 8338967: Improve performance for MemorySegment::fill [v10]
Per Minborg
pminborg at openjdk.org
Mon Sep 2 08:59:22 UTC 2024
On Fri, 30 Aug 2024 14:15:24 GMT, Maurizio Cimadamore <mcimadamore at openjdk.org> wrote:
>> src/java.base/share/classes/jdk/internal/foreign/AbstractMemorySegmentImpl.java line 208:
>>
>>> 206: }
>>> 207: final long u = Byte.toUnsignedLong(value);
>>> 208: final long longValue = u << 56 | u << 48 | u << 40 | u << 32 | u << 24 | u << 16 | u << 8 | u;
>>
>> this can be u * 0xFFFFFFFFFFFFL if value != 0 and just 0L if not: not sure if fast(er), need to measure.
>>
>> Most of the time filling is happy with 0 since zeroing is the most common case
>
>> this can be u * 0xFFFFFFFFFFFFL if value != 0 and just 0L if not: not sure if fast(er), need to measure.
>>
>> Most of the time filling is happy with 0 since zeroing is the most common case
>
> It's a clever trick. However, I was looking at similar tricks and found that the time spent here is irrelevant (e.g. I tried to always force `0` as the value, and couldn't see any difference).
If I run:
@Benchmark
public long shift() {
return ELEM_SIZE << 56 | ELEM_SIZE << 48 | ELEM_SIZE << 40 | ELEM_SIZE << 32 | ELEM_SIZE << 24 | ELEM_SIZE << 16 | ELEM_SIZE << 8 | ELEM_SIZE;
}
@Benchmark
public long mul() {
return ELEM_SIZE * 0xFFFF_FFFF_FFFFL;
}
Then I get:
Benchmark (ELEM_SIZE) Mode Cnt Score Error Units
TestFill.mul 31 avgt 30 0.586 ? 0.045 ns/op
TestFill.shift 31 avgt 30 0.938 ? 0.017 ns/op
On my M1 machine.
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/20712#discussion_r1740564110
More information about the core-libs-dev
mailing list