RFR: 8329331: Intrinsify Unsafe::setMemory [v24]

Sat Apr 20 19:09:44 UTC 2024

On Sat, 20 Apr 2024 14:14:59 GMT, Jatin Bhateja <jbhateja at openjdk.org> wrote:

>> Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   Long to short jmp; other cleanup
>
> src/hotspot/cpu/x86/stubGenerator_x86_64_arraycopy.cpp line 2530:
> 
>> 2528:     switch (type) {
>> 2529:       case USM_SHORT:
>> 2530:         __ movw(Address(dest, (2 * i)), wide_value);
> 
> MOVW emits an extra Operand Size Override prefix byte compared to 32 and 64 bit stores, any specific reason for keeping same unroll factor for all the stores.

My understanding is the spec requires the appropriate-sized write based on alignment and size.  This is why there's no 128-bit or 256-bit store loops.

> src/hotspot/cpu/x86/stubGenerator_x86_64_arraycopy.cpp line 2539:
> 
>> 2537:         break;
>> 2538:     }
>> 2539:   }
> 
> I understand we want to be as accurate as possible in filling the tail in an event of SIGBUS, but we are anyways creating a wide value for 8 packed bytes if destination segment was quadword aligned, aligned quadword stores are implicitly atomic on x86 targets, what's your thoughts on using a vector instruction based loop.

I believe the spec is specific on the size of the store required given alignment and size.  I want to honor that spec even though wider stores could be done in many cases.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/18555#discussion_r1573373720
PR Review Comment: https://git.openjdk.org/jdk/pull/18555#discussion_r1573374108