KNL specific fix: disable generating INC and DEC instructions on Xeon Phi and Silvermont CPUs

Vladimir Kozlov vladimir.kozlov at oracle.com
Thu Jun 22 18:59:59 UTC 2017


I am think in such case may be we should move these checks into separate methods defined in vm_version_x86.hpp: is_atom_cpu_family() and is_knights_cpu_family(). Code would be more clear then.

Thanks,
Vladimir

On 6/22/17 11:51 AM, Vladimir Kozlov wrote:
> On 6/22/17 11:34 AM, Kandu, Rahul wrote:
>>
>> Hi Vladimir,
>>
>> Below option is valid for CPU ID 0x57 and 0x85 due to AVX support on Xeon Phi. Silvermont CPUs 0x36, 0x37 etc. had support up to SSE4.x and not AVX.
>> _features &= ~CPU_VZEROUPPER;
>>
>> It may be better to separate CPU model checks for Knights family (Xeon Phi and its successors) from previous Silvermont (ATOM) family processors due to several differences in the instruction set.
> 
> Okay Thank you for explaining.
> 
> Vladimir
> 
>>
>> Rahul
>>
>> -----Original Message-----
>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com]
>> Sent: Wednesday, June 21, 2017 5:02 PM
>> To: Kandu, Rahul <rahul.kandu at intel.com>; hotspot-compiler-dev at openjdk.java.net
>> Subject: Re: KNL specific fix: disable generating INC and DEC instructions on Xeon Phi and Silvermont CPUs
>>
>> Thank you, Rahul
>>
>> Why you need to split code and not just add new cpu models checks at the second place?:
>>
>> @@ -1179,7 +1179,9 @@
>>        if ((cpu_family() == 0x06) &&
>>            ((extended_cpu_model() == 0x36) || // Centerton
>>             (extended_cpu_model() == 0x37) || // Silvermont
>> -         (extended_cpu_model() == 0x4D))) {
>> +         (extended_cpu_model() == 0x4D) ||
>> +         (extended_cpu_model() == 0x57) ||   // Xeon Phi 3200/5200/7200
>> +         (extended_cpu_model() == 0x85))) {  // Future Xeon Phi
>>    #ifdef COMPILER2
>>          if (FLAG_IS_DEFAULT(OptoScheduling)) {
>>            OptoScheduling = true;
>> @@ -1190,6 +1192,9 @@
>>              UseUnalignedLoadStores = true; // use movdqu on newest Intel cpus
>>            }
>>          }
>> +      if (FLAG_IS_DEFAULT(UseIncDec)) {
>> +        FLAG_SET_DEFAULT(UseIncDec, false);
>> +      }
>>        }
>>        if(FLAG_IS_DEFAULT(AllocatePrefetchInstr) && supports_3dnow_prefetch()) {
>>
>> Thanks,
>> Vladimir
>>
>> On 6/21/17 3:39 PM, Kandu, Rahul wrote:
>>> Hi Vladimir,
>>>
>>> Webrev for the code change.. after correcting auto indent parameters as specified.
>>>
>>> http://cr.openjdk.java.net/~vdeshpande/8182138/webrev.02/
>>> Openjdk bug: https://bugs.openjdk.java.net/browse/JDK-8182138
>>>
>>>
>>> --- old/src/cpu/x86/vm/vm_version_x86.cpp    2017-06-21 14:57:28.002941500 -0700
>>> +++ new/src/cpu/x86/vm/vm_version_x86.cpp    2017-06-21 14:57:27.660400400 -0700
>>> @@ -654,6 +654,19 @@
>>>            ((extended_cpu_model() == 0x57) ||   // Xeon Phi 3200/5200/7200
>>>            (extended_cpu_model() == 0x85))) {  // Future Xeon Phi
>>>          _features &= ~CPU_VZEROUPPER;
>>> +      if (FLAG_IS_DEFAULT(UseIncDec)) {
>>> +        FLAG_SET_DEFAULT(UseIncDec, false);
>>> +      }
>>> +#ifdef COMPILER2
>>> +      if (FLAG_IS_DEFAULT(OptoScheduling)) {
>>> +        OptoScheduling = true;
>>> +      }
>>> +#endif
>>> +      if (supports_sse4_2()) { // Silvermont
>>> +        if (FLAG_IS_DEFAULT(UseUnalignedLoadStores)) {
>>> +          UseUnalignedLoadStores = true; // use movdqu on newest Intel cpus
>>> +        }
>>> +      }
>>>        }
>>>      }
>>> @@ -1193,6 +1206,9 @@
>>>              UseUnalignedLoadStores = true; // use movdqu on newest Intel cpus
>>>            }
>>>          }
>>> +      if (FLAG_IS_DEFAULT(UseIncDec)) {
>>> +        FLAG_SET_DEFAULT(UseIncDec, false);
>>> +      }
>>>        }
>>>
>>> regards,
>>> Rahul
>>>
>>>
>>>
>>>
>>> -----Original Message-----
>>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com]
>>> Sent: Friday, June 16, 2017 2:29 PM
>>> To: Kandu, Rahul <rahul.kandu at intel.com>;
>>> hotspot-compiler-dev at openjdk.java.net
>>> Subject: Re: KNL specific fix: disable generating INC and DEC
>>> instructions on Xeon Phi and Silvermont CPUs
>>>
>>> I don't see it is fixed:
>>>
>>> +       FLAG_SET_DEFAULT(UseIncDec, false);
>>> +       }
>>> +#ifdef COMPILER2
>>> + if (FLAG_IS_DEFAULT(OptoScheduling)) {
>>> +  OptoScheduling = true;
>>> + }
>>> +#endif
>>> +      if (supports_sse4_2()) { // Silvermont
>>>
>>> +       if (FLAG_IS_DEFAULT(UseIncDec)){
>>> +        FLAG_SET_DEFAULT(UseIncDec, false);
>>> +        }
>>>
>>> Vladimir
>>>
>>> On 6/16/17 2:03 PM, Kandu, Rahul wrote:
>>>> Hi Vladimir,
>>>>
>>>> Thanks. Fixed the indents- no tabs in the code change. Please find
>>>> the updated webrev below.
>>>>
>>>> Openjdk bug location:
>>>> https://bugs.openjdk.java.net/browse/JDK-8182138
>>>>
>>>> Webrev for the code change:
>>>> http://cr.openjdk.java.net/~vdeshpande/8182138/webrev.01/
>>>>
>>>> regards,
>>>>
>>>> Rahul
>>>>
>>>> *From:*Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com]
>>>> *Sent:* Thursday, June 15, 2017 2:10 PM
>>>> *To:* Kandu, Rahul <rahul.kandu at intel.com>;
>>>> hotspot-compiler-dev at openjdk.java.net
>>>> *Subject:* Re: KNL specific fix: disable generating INC and DEC
>>>> instructions on Xeon Phi and Silvermont CPUs
>>>>
>>>> Hi Rahul
>>>>
>>>> Please fix indents - don't use tabs.
>>>>
>>>> Vladimir
>>>>
>>>> On 6/15/17 1:14 PM, Kandu, Rahul wrote:
>>>>
>>>>       Hi all,
>>>>
>>>>       The following patch disables generating INC, DEC instructions on
>>>>       Xeon Phi and Silvermont ATOM based CPUs. We have currently
>>>>       identified that using INC and DEC can suffer from unexpected
>>>>       performance drops on certain processors which don't optimize for
>>>>       partial write flags. This patch disables generation of these two
>>>>       instructions as they are more commonly used at loop
>>>>       increment/decrement.
>>>>
>>>>       Patch provides 3.65% better performance on Knights Landing CPU on
>>>>       SPECjvm2008 composite score as per runs below on the latest openjdk
>>>>       source.
>>>>
>>>>       Openjdk bug location:
>>>> https://bugs.openjdk.java.net/browse/JDK-8182138
>>>>
>>>>       Webrev for the code change:
>>>>       http://cr.openjdk.java.net/~vdeshpande/8182138/webrev.00/
>>>>       <http://cr.openjdk.java.net/%7Evdeshpande/8182138/webrev.00/>
>>>>
>>>>       Scores:
>>>>
>>>>
>>>>
>>>>       *6/10 jdk10 code (no change)*
>>>>
>>>>
>>>>
>>>>       *6/10 jdk10code with this patch *
>>>>
>>>>
>>>>
>>>>
>>>>       *run1*
>>>>
>>>>
>>>>
>>>>       *run2*
>>>>
>>>>
>>>>
>>>>       *run3*
>>>>
>>>>
>>>>
>>>>       *geomean*
>>>>
>>>>
>>>>
>>>>       *run1*
>>>>
>>>>
>>>>
>>>>       *run2*
>>>>
>>>>
>>>>
>>>>       *run3*
>>>>
>>>>
>>>>
>>>>       *geomean*
>>>>
>>>>
>>>>
>>>>       lu.small
>>>>
>>>>
>>>>
>>>>       1503.79
>>>>
>>>>
>>>>
>>>>       1500.62
>>>>
>>>>
>>>>
>>>>       1494.98
>>>>
>>>>
>>>>
>>>>       1499.792
>>>>
>>>>
>>>>
>>>>       lu.small
>>>>
>>>>
>>>>
>>>>       1478.48
>>>>
>>>>
>>>>
>>>>       1493.78
>>>>
>>>>
>>>>
>>>>       1509.2
>>>>
>>>>
>>>>
>>>>       1493.767
>>>>
>>>>
>>>>
>>>>       sor.small
>>>>
>>>>
>>>>
>>>>       2417.24
>>>>
>>>>
>>>>
>>>>       2372.1
>>>>
>>>>
>>>>
>>>>       2356.47
>>>>
>>>>
>>>>
>>>>       2381.798
>>>>
>>>>
>>>>
>>>>       sor.small
>>>>
>>>>
>>>>
>>>>       2436.89
>>>>
>>>>
>>>>
>>>>       2434.46
>>>>
>>>>
>>>>
>>>>       2446.88
>>>>
>>>>
>>>>
>>>>       2439.404
>>>>
>>>>
>>>>
>>>>       sparse.small
>>>>
>>>>
>>>>
>>>>       606.35
>>>>
>>>>
>>>>
>>>>       635.19
>>>>
>>>>
>>>>
>>>>       595.44
>>>>
>>>>
>>>>
>>>>       612.099
>>>>
>>>>
>>>>
>>>>       sparse.small
>>>>
>>>>
>>>>
>>>>       681.96
>>>>
>>>>
>>>>
>>>>       728.02
>>>>
>>>>
>>>>
>>>>       671.41
>>>>
>>>>
>>>>
>>>>       693.3673
>>>>
>>>>
>>>>
>>>>       fft.small
>>>>
>>>>
>>>>
>>>>       1463.55
>>>>
>>>>
>>>>
>>>>       1406.43
>>>>
>>>>
>>>>
>>>>       1173.63
>>>>
>>>>
>>>>
>>>>       1341.793
>>>>
>>>>
>>>>
>>>>       fft.small
>>>>
>>>>
>>>>
>>>>       1220.14
>>>>
>>>>
>>>>
>>>>       1425.19
>>>>
>>>>
>>>>
>>>>       1190.06
>>>>
>>>>
>>>>
>>>>       1274.335
>>>>
>>>>
>>>>
>>>>       monte_carlo
>>>>
>>>>
>>>>
>>>>       823.66
>>>>
>>>>
>>>>
>>>>       825.96
>>>>
>>>>
>>>>
>>>>       761.26
>>>>
>>>>
>>>>
>>>>       803.0575
>>>>
>>>>
>>>>
>>>>       monte_carlo
>>>>
>>>>
>>>>
>>>>       939.53
>>>>
>>>>
>>>>
>>>>       923
>>>>
>>>>
>>>>
>>>>       934.76
>>>>
>>>>
>>>>
>>>>       932.4041
>>>>
>>>>
>>>>
>>>>       sparse.large
>>>>
>>>>
>>>>
>>>>       159.45
>>>>
>>>>
>>>>
>>>>       139.83
>>>>
>>>>
>>>>
>>>>       155.76
>>>>
>>>>
>>>>
>>>>       151.4352
>>>>
>>>>
>>>>
>>>>       sparse.large
>>>>
>>>>
>>>>
>>>>       100.66
>>>>
>>>>
>>>>
>>>>       150.22
>>>>
>>>>
>>>>
>>>>       179.79
>>>>
>>>>
>>>>
>>>>       139.5672
>>>>
>>>>
>>>>
>>>>       fft.large
>>>>
>>>>
>>>>
>>>>       419.19
>>>>
>>>>
>>>>
>>>>       425.81
>>>>
>>>>
>>>>
>>>>       432.6
>>>>
>>>>
>>>>
>>>>       425.8315
>>>>
>>>>
>>>>
>>>>       fft.large
>>>>
>>>>
>>>>
>>>>       433.11
>>>>
>>>>
>>>>
>>>>       424.72
>>>>
>>>>
>>>>
>>>>       429.07
>>>>
>>>>
>>>>
>>>>       428.953
>>>>
>>>>
>>>>
>>>>       sor.large
>>>>
>>>>
>>>>
>>>>       416.31
>>>>
>>>>
>>>>
>>>>       262.98
>>>>
>>>>
>>>>
>>>>       271.31
>>>>
>>>>
>>>>
>>>>       309.6957
>>>>
>>>>
>>>>
>>>>       sor.large
>>>>
>>>>
>>>>
>>>>       366.6
>>>>
>>>>
>>>>
>>>>       397.67
>>>>
>>>>
>>>>
>>>>       352.75
>>>>
>>>>
>>>>
>>>>       371.8725
>>>>
>>>>
>>>>
>>>>       lu.large
>>>>
>>>>
>>>>
>>>>       116.46
>>>>
>>>>
>>>>
>>>>       127.51
>>>>
>>>>
>>>>
>>>>       129.33
>>>>
>>>>
>>>>
>>>>       124.3007
>>>>
>>>>
>>>>
>>>>       lu.large
>>>>
>>>>
>>>>
>>>>       124.2
>>>>
>>>>
>>>>
>>>>       122.69
>>>>
>>>>
>>>>
>>>>       124.1
>>>>
>>>>
>>>>
>>>>       123.6614
>>>>
>>>>
>>>>
>>>>       transform
>>>>
>>>>
>>>>
>>>>       1056.64
>>>>
>>>>
>>>>
>>>>       1066.6
>>>>
>>>>
>>>>
>>>>       1021.08
>>>>
>>>>
>>>>
>>>>       1047.923
>>>>
>>>>
>>>>
>>>>       transform
>>>>
>>>>
>>>>
>>>>       1015.85
>>>>
>>>>
>>>>
>>>>       1056.42
>>>>
>>>>
>>>>
>>>>       1049.42
>>>>
>>>>
>>>>
>>>>       1040.412
>>>>
>>>>
>>>>
>>>>       validation
>>>>
>>>>
>>>>
>>>>       1371.86
>>>>
>>>>
>>>>
>>>>       1898.49
>>>>
>>>>
>>>>
>>>>       1971.28
>>>>
>>>>
>>>>
>>>>       1725.131
>>>>
>>>>
>>>>
>>>>       validation
>>>>
>>>>
>>>>
>>>>       2088.81
>>>>
>>>>
>>>>
>>>>       2178.14
>>>>
>>>>
>>>>
>>>>       2112.95
>>>>
>>>>
>>>>
>>>>       2126.301
>>>>
>>>>
>>>>
>>>>       aes
>>>>
>>>>
>>>>
>>>>       276.67
>>>>
>>>>
>>>>
>>>>       255.84
>>>>
>>>>
>>>>
>>>>       299.78
>>>>
>>>>
>>>>
>>>>       276.8499
>>>>
>>>>
>>>>
>>>>       aes
>>>>
>>>>
>>>>
>>>>       261.5
>>>>
>>>>
>>>>
>>>>       258.95
>>>>
>>>>
>>>>
>>>>       290.17
>>>>
>>>>
>>>>
>>>>       269.8444
>>>>
>>>>
>>>>
>>>>       rsa
>>>>
>>>>
>>>>
>>>>       1041.29
>>>>
>>>>
>>>>
>>>>       1069.51
>>>>
>>>>
>>>>
>>>>       1069.26
>>>>
>>>>
>>>>
>>>>       1059.937
>>>>
>>>>
>>>>
>>>>       rsa
>>>>
>>>>
>>>>
>>>>       1091.45
>>>>
>>>>
>>>>
>>>>       1089.15
>>>>
>>>>
>>>>
>>>>       1095.52
>>>>
>>>>
>>>>
>>>>       1092.037
>>>>
>>>>
>>>>
>>>>       signverify
>>>>
>>>>
>>>>
>>>>       2583.7
>>>>
>>>>
>>>>
>>>>       2592.98
>>>>
>>>>
>>>>
>>>>       2586.34
>>>>
>>>>
>>>>
>>>>       2587.67
>>>>
>>>>
>>>>
>>>>       signverify
>>>>
>>>>
>>>>
>>>>       2660.73
>>>>
>>>>
>>>>
>>>>       2664.17
>>>>
>>>>
>>>>
>>>>       2634.47
>>>>
>>>>
>>>>
>>>>       2653.09
>>>>
>>>>
>>>>
>>>>       compress
>>>>
>>>>
>>>>
>>>>       817.65
>>>>
>>>>
>>>>
>>>>       817.44
>>>>
>>>>
>>>>
>>>>       816.55
>>>>
>>>>
>>>>
>>>>       817.2132
>>>>
>>>>
>>>>
>>>>       compress
>>>>
>>>>
>>>>
>>>>       852.55
>>>>
>>>>
>>>>
>>>>       847.61
>>>>
>>>>
>>>>
>>>>       894.59
>>>>
>>>>
>>>>
>>>>       864.6626
>>>>
>>>>
>>>>
>>>>       serial
>>>>
>>>>
>>>>
>>>>       608.48
>>>>
>>>>
>>>>
>>>>       586.62
>>>>
>>>>
>>>>
>>>>       615.37
>>>>
>>>>
>>>>
>>>>       603.3646
>>>>
>>>>
>>>>
>>>>       serial
>>>>
>>>>
>>>>
>>>>       627.19
>>>>
>>>>
>>>>
>>>>       605.21
>>>>
>>>>
>>>>
>>>>       619.31
>>>>
>>>>
>>>>
>>>>       617.1695
>>>>
>>>>
>>>>
>>>>       sunflow
>>>>
>>>>
>>>>
>>>>       371.28
>>>>
>>>>
>>>>
>>>>       373.03
>>>>
>>>>
>>>>
>>>>       373.04
>>>>
>>>>
>>>>
>>>>       372.4491
>>>>
>>>>
>>>>
>>>>       sunflow
>>>>
>>>>
>>>>
>>>>       368.59
>>>>
>>>>
>>>>
>>>>       381.78
>>>>
>>>>
>>>>
>>>>       369.64
>>>>
>>>>
>>>>
>>>>       373.289
>>>>
>>>>
>>>>
>>>>       mpegaudio
>>>>
>>>>
>>>>
>>>>       743.85
>>>>
>>>>
>>>>
>>>>       734.46
>>>>
>>>>
>>>>
>>>>       752.62
>>>>
>>>>
>>>>
>>>>       743.6064
>>>>
>>>>
>>>>
>>>>       mpegaudio
>>>>
>>>>
>>>>
>>>>       775.45
>>>>
>>>>
>>>>
>>>>       773.35
>>>>
>>>>
>>>>
>>>>       776.98
>>>>
>>>>
>>>>
>>>>       775.2586
>>>>
>>>>
>>>>
>>>>       derby
>>>>
>>>>
>>>>
>>>>       1929.9
>>>>
>>>>
>>>>
>>>>       1901.28
>>>>
>>>>
>>>>
>>>>       1922.56
>>>>
>>>>
>>>>
>>>>       1917.875
>>>>
>>>>
>>>>
>>>>       derby
>>>>
>>>>
>>>>
>>>>       1927.97
>>>>
>>>>
>>>>
>>>>       1865.47
>>>>
>>>>
>>>>
>>>>       1919.17
>>>>
>>>>
>>>>
>>>>       1904.002
>>>>
>>>>
>>>>
>>>>       Total
>>>>
>>>>
>>>>
>>>>       780.54
>>>>
>>>>
>>>>
>>>>       779.91
>>>>
>>>>
>>>>
>>>>       786.98
>>>>
>>>>
>>>>
>>>>       782.4702
>>>>
>>>>
>>>>
>>>>       Total
>>>>
>>>>
>>>>
>>>>       801
>>>>
>>>>
>>>>
>>>>       812.98
>>>>
>>>>
>>>>
>>>>       819
>>>>
>>>>
>>>>
>>>>       810.9587
>>>>
>>>>
>>>>
>>>>       3.65% improvement
>>>>
>>>>       regards,
>>>>
>>>>       Rahul
>>>>


More information about the hotspot-compiler-dev mailing list