RFR(S) 8192846: Extend cmov vectorization to work for float

Tue Dec 5 18:02:53 UTC 2017

Pushed into jdk/hs with approval of our gatekeeper.

Regards,
Vladimir

On 12/4/17 2:26 PM, Vladimir Kozlov wrote:
> On 12/4/17 11:19 AM, Lupusoru, Razvan A wrote:
>> Hi Vladimir,
>>
>> I removed the "do_vector_loop" check in order to enable the 
>> optimization by default - but it seems that was undesirable from your 
>> point of view.
> 
> It is not desirable for JDK 10 since it is too late.
> 
>>
>> Thus, we have added a new flag "UseVectorCmov" which is disabled by 
>> default but anyone interested in feature can enable it manually. 
>> Ideally we can get it in this form into this upcoming release and 
>> remove the guard for following release once we check the stability of 
>> it more.
> 
> Sounds good. Agree.
> 
>>
>> The path for the updated webrev:
>> http://cr.openjdk.java.net/~vdeshpande/8192846/webrev.01/
>> Code contributed by Razvan Lupusoru (rlupusoru)
> 
> I will test it.
> 
> Thanks,
> Vladimir
> 
>>
>> Thanks,
>> Razvan
>>
>> -----Original Message-----
>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com]
>> Sent: Thursday, November 30, 2017 3:46 PM
>> To: Lupusoru, Razvan A <razvan.a.lupusoru at intel.com>; 
>> hotspot-compiler-dev at openjdk.java.net
>> Subject: Re: RFR(S) 8192846: Extend cmov vectorization to work for float
>>
>> Thank you, Razvan
>>
>> I have one question: why you removed _do_vector_loop checks in 
>> superword.cpp?
>>
>> Also I think it is late for jdk 10. I will test and push when jdk 10 
>> is forked.
>>
>> Thanks,
>> Vladimir
>>
>> On 11/30/17 1:48 PM, Lupusoru, Razvan A wrote:
>>> Hi all,
>>>
>>> This submission is to the issue noted in JDK-8192846  - namely that
>>> vectorized cmov for floats is not supported.
>>>
>>> This patch rectifies the situation so that when scalar Cmov for float
>>> and double is generated (or forced generated by passing
>>> -XX:+UseCMoveUnconditionally), it stands a chance for Vectorization
>>> using existing functionality previously enabled for doubles. The 
>>> following code pattern will get vectorized with patch above:
>>>
>>> private void cmove_kernel_float(float[] in1, float[] in2, int length,
>>> float[] out) {
>>>
>>>     for (int i = 0; i < length; i++) {
>>>
>>>       out[i] = (in1[i] > in2[i]) ? in1[i] : in2[i];
>>>
>>>     }
>>>
>>> }
>>>
>>> Instructions generated for loop are:
>>>
>>>     vmovdqu 0x10(%rcx,%r10,4),%ymm0
>>>
>>>     vmovdqu 0x10(%rdx,%r10,4),%ymm1
>>>
>>>     vcmpleps %ymm0,%ymm1,%ymm2
>>>
>>>     vblendvps %ymm2,%ymm0,%ymm1,%ymm2
>>>
>>>     vmovdqu %ymm2,0x10(%r9,%r10,4)
>>>
>>> The patch is available for review here:
>>>
>>> http://cr.openjdk.java.net/~rlupusoru/jdk_hs/webrev_cmovevf_00/
>>>
>>> Jtreg testing with compiler tests has successfully passed. Thanks for 
>>> review!
>>>
>>> --Razvan
>>>