[15] RFR(S): 8239009: C2: Don't use PSHUF to load scalars from memory on x86
Vladimir Ivanov
vladimir.x.ivanov at oracle.com
Fri Feb 14 15:28:36 UTC 2020
> How many bytes are we loading from memory after your change?
4 bytes for int & float and 8 bytes for double.
Wide memory accesses have both performance and correctness implications:
* the access is not properly aligned (2-/4-/8-byte vs 16-byte)
* can trigger page faults
I noticed that the change in ReplS_mem is not enough: movdl loads 4
bytes and there's no m16-to-xmmreg instruction available. So, I aligned
ReplS_mem with ReplB_mem: use VPBROADCASTW when available (AVX2).
Otherwise, it is matched as loadS + ReplS_reg.
Updated webrev:
http://cr.openjdk.java.net/~vlivanov/8239009/webrev.01
Best regards,
Vladimir Ivanov
> On 2/13/20 8:04 AM, Vladimir Ivanov wrote:
>> http://cr.openjdk.java.net/~vlivanov/8239009/webrev.00
>> https://bugs.openjdk.java.net/browse/JDK-8239009
>>
>> Mem-to-reg variants of PSHUF instructions load 16 bytes from memory.
>>
>> Switch to reg-to-reg variants and perform scalar loads of proper size
>> instead.
>>
>> Testing: hs-precheckin-comp,hs-tier1,hs-tier2
>>
>> Thanks!
>>
>> Best regards,
>> Vladimir Ivanov
>
More information about the hotspot-compiler-dev
mailing list