RFR: 8338967: Improve performance for MemorySegment::fill [v10]
Per Minborg
pminborg at openjdk.org
Tue Sep 3 08:41:20 UTC 2024
On Mon, 2 Sep 2024 09:32:56 GMT, Maurizio Cimadamore <mcimadamore at openjdk.org> wrote:
>> If I run:
>>
>>
>> @Benchmark
>> public long shift() {
>> return ELEM_SIZE << 56 | ELEM_SIZE << 48 | ELEM_SIZE << 40 | ELEM_SIZE << 32 | ELEM_SIZE << 24 | ELEM_SIZE << 16 | ELEM_SIZE << 8 | ELEM_SIZE;
>> }
>>
>> @Benchmark
>> public long mul() {
>> return ELEM_SIZE * 0xFFFF_FFFF_FFFFL;
>> }
>>
>> Then I get:
>>
>> Benchmark (ELEM_SIZE) Mode Cnt Score Error Units
>> TestFill.mul 31 avgt 30 0.586 ? 0.045 ns/op
>> TestFill.shift 31 avgt 30 0.938 ? 0.017 ns/op
>>
>> On my M1 machine.
>
> I found similar small improvements to be had (I wrote about them offline) when replacing the bitwise-based tests (e.g. `foo & 4 != 0`) with a more explicit check for `remainingBytes >=4`. Seems like bitwise operations are not as optimized (or perhaps the assembly instructions for them is overall more convoluted - I haven't checked).
I've tried
final long longValue = Byte.toUnsignedLong(value) * 0x0101010101010101L;
But it had the same performance as explicit bit shifting on M1.
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/20712#discussion_r1741664877
More information about the core-libs-dev
mailing list