KNL specific fix: disable generating INC and DEC instructions on Xeon Phi and Silvermont CPUs
Vladimir Kozlov
vladimir.kozlov at oracle.com
Thu Jun 22 18:59:59 UTC 2017
I am think in such case may be we should move these checks into separate methods defined in vm_version_x86.hpp: is_atom_cpu_family() and is_knights_cpu_family(). Code would be more clear then.
Thanks,
Vladimir
On 6/22/17 11:51 AM, Vladimir Kozlov wrote:
> On 6/22/17 11:34 AM, Kandu, Rahul wrote:
>>
>> Hi Vladimir,
>>
>> Below option is valid for CPU ID 0x57 and 0x85 due to AVX support on Xeon Phi. Silvermont CPUs 0x36, 0x37 etc. had support up to SSE4.x and not AVX.
>> _features &= ~CPU_VZEROUPPER;
>>
>> It may be better to separate CPU model checks for Knights family (Xeon Phi and its successors) from previous Silvermont (ATOM) family processors due to several differences in the instruction set.
>
> Okay Thank you for explaining.
>
> Vladimir
>
>>
>> Rahul
>>
>> -----Original Message-----
>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com]
>> Sent: Wednesday, June 21, 2017 5:02 PM
>> To: Kandu, Rahul <rahul.kandu at intel.com>; hotspot-compiler-dev at openjdk.java.net
>> Subject: Re: KNL specific fix: disable generating INC and DEC instructions on Xeon Phi and Silvermont CPUs
>>
>> Thank you, Rahul
>>
>> Why you need to split code and not just add new cpu models checks at the second place?:
>>
>> @@ -1179,7 +1179,9 @@
>> if ((cpu_family() == 0x06) &&
>> ((extended_cpu_model() == 0x36) || // Centerton
>> (extended_cpu_model() == 0x37) || // Silvermont
>> - (extended_cpu_model() == 0x4D))) {
>> + (extended_cpu_model() == 0x4D) ||
>> + (extended_cpu_model() == 0x57) || // Xeon Phi 3200/5200/7200
>> + (extended_cpu_model() == 0x85))) { // Future Xeon Phi
>> #ifdef COMPILER2
>> if (FLAG_IS_DEFAULT(OptoScheduling)) {
>> OptoScheduling = true;
>> @@ -1190,6 +1192,9 @@
>> UseUnalignedLoadStores = true; // use movdqu on newest Intel cpus
>> }
>> }
>> + if (FLAG_IS_DEFAULT(UseIncDec)) {
>> + FLAG_SET_DEFAULT(UseIncDec, false);
>> + }
>> }
>> if(FLAG_IS_DEFAULT(AllocatePrefetchInstr) && supports_3dnow_prefetch()) {
>>
>> Thanks,
>> Vladimir
>>
>> On 6/21/17 3:39 PM, Kandu, Rahul wrote:
>>> Hi Vladimir,
>>>
>>> Webrev for the code change.. after correcting auto indent parameters as specified.
>>>
>>> http://cr.openjdk.java.net/~vdeshpande/8182138/webrev.02/
>>> Openjdk bug: https://bugs.openjdk.java.net/browse/JDK-8182138
>>>
>>>
>>> --- old/src/cpu/x86/vm/vm_version_x86.cpp 2017-06-21 14:57:28.002941500 -0700
>>> +++ new/src/cpu/x86/vm/vm_version_x86.cpp 2017-06-21 14:57:27.660400400 -0700
>>> @@ -654,6 +654,19 @@
>>> ((extended_cpu_model() == 0x57) || // Xeon Phi 3200/5200/7200
>>> (extended_cpu_model() == 0x85))) { // Future Xeon Phi
>>> _features &= ~CPU_VZEROUPPER;
>>> + if (FLAG_IS_DEFAULT(UseIncDec)) {
>>> + FLAG_SET_DEFAULT(UseIncDec, false);
>>> + }
>>> +#ifdef COMPILER2
>>> + if (FLAG_IS_DEFAULT(OptoScheduling)) {
>>> + OptoScheduling = true;
>>> + }
>>> +#endif
>>> + if (supports_sse4_2()) { // Silvermont
>>> + if (FLAG_IS_DEFAULT(UseUnalignedLoadStores)) {
>>> + UseUnalignedLoadStores = true; // use movdqu on newest Intel cpus
>>> + }
>>> + }
>>> }
>>> }
>>> @@ -1193,6 +1206,9 @@
>>> UseUnalignedLoadStores = true; // use movdqu on newest Intel cpus
>>> }
>>> }
>>> + if (FLAG_IS_DEFAULT(UseIncDec)) {
>>> + FLAG_SET_DEFAULT(UseIncDec, false);
>>> + }
>>> }
>>>
>>> regards,
>>> Rahul
>>>
>>>
>>>
>>>
>>> -----Original Message-----
>>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com]
>>> Sent: Friday, June 16, 2017 2:29 PM
>>> To: Kandu, Rahul <rahul.kandu at intel.com>;
>>> hotspot-compiler-dev at openjdk.java.net
>>> Subject: Re: KNL specific fix: disable generating INC and DEC
>>> instructions on Xeon Phi and Silvermont CPUs
>>>
>>> I don't see it is fixed:
>>>
>>> + FLAG_SET_DEFAULT(UseIncDec, false);
>>> + }
>>> +#ifdef COMPILER2
>>> + if (FLAG_IS_DEFAULT(OptoScheduling)) {
>>> + OptoScheduling = true;
>>> + }
>>> +#endif
>>> + if (supports_sse4_2()) { // Silvermont
>>>
>>> + if (FLAG_IS_DEFAULT(UseIncDec)){
>>> + FLAG_SET_DEFAULT(UseIncDec, false);
>>> + }
>>>
>>> Vladimir
>>>
>>> On 6/16/17 2:03 PM, Kandu, Rahul wrote:
>>>> Hi Vladimir,
>>>>
>>>> Thanks. Fixed the indents- no tabs in the code change. Please find
>>>> the updated webrev below.
>>>>
>>>> Openjdk bug location:
>>>> https://bugs.openjdk.java.net/browse/JDK-8182138
>>>>
>>>> Webrev for the code change:
>>>> http://cr.openjdk.java.net/~vdeshpande/8182138/webrev.01/
>>>>
>>>> regards,
>>>>
>>>> Rahul
>>>>
>>>> *From:*Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com]
>>>> *Sent:* Thursday, June 15, 2017 2:10 PM
>>>> *To:* Kandu, Rahul <rahul.kandu at intel.com>;
>>>> hotspot-compiler-dev at openjdk.java.net
>>>> *Subject:* Re: KNL specific fix: disable generating INC and DEC
>>>> instructions on Xeon Phi and Silvermont CPUs
>>>>
>>>> Hi Rahul
>>>>
>>>> Please fix indents - don't use tabs.
>>>>
>>>> Vladimir
>>>>
>>>> On 6/15/17 1:14 PM, Kandu, Rahul wrote:
>>>>
>>>> Hi all,
>>>>
>>>> The following patch disables generating INC, DEC instructions on
>>>> Xeon Phi and Silvermont ATOM based CPUs. We have currently
>>>> identified that using INC and DEC can suffer from unexpected
>>>> performance drops on certain processors which don't optimize for
>>>> partial write flags. This patch disables generation of these two
>>>> instructions as they are more commonly used at loop
>>>> increment/decrement.
>>>>
>>>> Patch provides 3.65% better performance on Knights Landing CPU on
>>>> SPECjvm2008 composite score as per runs below on the latest openjdk
>>>> source.
>>>>
>>>> Openjdk bug location:
>>>> https://bugs.openjdk.java.net/browse/JDK-8182138
>>>>
>>>> Webrev for the code change:
>>>> http://cr.openjdk.java.net/~vdeshpande/8182138/webrev.00/
>>>> <http://cr.openjdk.java.net/%7Evdeshpande/8182138/webrev.00/>
>>>>
>>>> Scores:
>>>>
>>>>
>>>>
>>>> *6/10 jdk10 code (no change)*
>>>>
>>>>
>>>>
>>>> *6/10 jdk10code with this patch *
>>>>
>>>>
>>>>
>>>>
>>>> *run1*
>>>>
>>>>
>>>>
>>>> *run2*
>>>>
>>>>
>>>>
>>>> *run3*
>>>>
>>>>
>>>>
>>>> *geomean*
>>>>
>>>>
>>>>
>>>> *run1*
>>>>
>>>>
>>>>
>>>> *run2*
>>>>
>>>>
>>>>
>>>> *run3*
>>>>
>>>>
>>>>
>>>> *geomean*
>>>>
>>>>
>>>>
>>>> lu.small
>>>>
>>>>
>>>>
>>>> 1503.79
>>>>
>>>>
>>>>
>>>> 1500.62
>>>>
>>>>
>>>>
>>>> 1494.98
>>>>
>>>>
>>>>
>>>> 1499.792
>>>>
>>>>
>>>>
>>>> lu.small
>>>>
>>>>
>>>>
>>>> 1478.48
>>>>
>>>>
>>>>
>>>> 1493.78
>>>>
>>>>
>>>>
>>>> 1509.2
>>>>
>>>>
>>>>
>>>> 1493.767
>>>>
>>>>
>>>>
>>>> sor.small
>>>>
>>>>
>>>>
>>>> 2417.24
>>>>
>>>>
>>>>
>>>> 2372.1
>>>>
>>>>
>>>>
>>>> 2356.47
>>>>
>>>>
>>>>
>>>> 2381.798
>>>>
>>>>
>>>>
>>>> sor.small
>>>>
>>>>
>>>>
>>>> 2436.89
>>>>
>>>>
>>>>
>>>> 2434.46
>>>>
>>>>
>>>>
>>>> 2446.88
>>>>
>>>>
>>>>
>>>> 2439.404
>>>>
>>>>
>>>>
>>>> sparse.small
>>>>
>>>>
>>>>
>>>> 606.35
>>>>
>>>>
>>>>
>>>> 635.19
>>>>
>>>>
>>>>
>>>> 595.44
>>>>
>>>>
>>>>
>>>> 612.099
>>>>
>>>>
>>>>
>>>> sparse.small
>>>>
>>>>
>>>>
>>>> 681.96
>>>>
>>>>
>>>>
>>>> 728.02
>>>>
>>>>
>>>>
>>>> 671.41
>>>>
>>>>
>>>>
>>>> 693.3673
>>>>
>>>>
>>>>
>>>> fft.small
>>>>
>>>>
>>>>
>>>> 1463.55
>>>>
>>>>
>>>>
>>>> 1406.43
>>>>
>>>>
>>>>
>>>> 1173.63
>>>>
>>>>
>>>>
>>>> 1341.793
>>>>
>>>>
>>>>
>>>> fft.small
>>>>
>>>>
>>>>
>>>> 1220.14
>>>>
>>>>
>>>>
>>>> 1425.19
>>>>
>>>>
>>>>
>>>> 1190.06
>>>>
>>>>
>>>>
>>>> 1274.335
>>>>
>>>>
>>>>
>>>> monte_carlo
>>>>
>>>>
>>>>
>>>> 823.66
>>>>
>>>>
>>>>
>>>> 825.96
>>>>
>>>>
>>>>
>>>> 761.26
>>>>
>>>>
>>>>
>>>> 803.0575
>>>>
>>>>
>>>>
>>>> monte_carlo
>>>>
>>>>
>>>>
>>>> 939.53
>>>>
>>>>
>>>>
>>>> 923
>>>>
>>>>
>>>>
>>>> 934.76
>>>>
>>>>
>>>>
>>>> 932.4041
>>>>
>>>>
>>>>
>>>> sparse.large
>>>>
>>>>
>>>>
>>>> 159.45
>>>>
>>>>
>>>>
>>>> 139.83
>>>>
>>>>
>>>>
>>>> 155.76
>>>>
>>>>
>>>>
>>>> 151.4352
>>>>
>>>>
>>>>
>>>> sparse.large
>>>>
>>>>
>>>>
>>>> 100.66
>>>>
>>>>
>>>>
>>>> 150.22
>>>>
>>>>
>>>>
>>>> 179.79
>>>>
>>>>
>>>>
>>>> 139.5672
>>>>
>>>>
>>>>
>>>> fft.large
>>>>
>>>>
>>>>
>>>> 419.19
>>>>
>>>>
>>>>
>>>> 425.81
>>>>
>>>>
>>>>
>>>> 432.6
>>>>
>>>>
>>>>
>>>> 425.8315
>>>>
>>>>
>>>>
>>>> fft.large
>>>>
>>>>
>>>>
>>>> 433.11
>>>>
>>>>
>>>>
>>>> 424.72
>>>>
>>>>
>>>>
>>>> 429.07
>>>>
>>>>
>>>>
>>>> 428.953
>>>>
>>>>
>>>>
>>>> sor.large
>>>>
>>>>
>>>>
>>>> 416.31
>>>>
>>>>
>>>>
>>>> 262.98
>>>>
>>>>
>>>>
>>>> 271.31
>>>>
>>>>
>>>>
>>>> 309.6957
>>>>
>>>>
>>>>
>>>> sor.large
>>>>
>>>>
>>>>
>>>> 366.6
>>>>
>>>>
>>>>
>>>> 397.67
>>>>
>>>>
>>>>
>>>> 352.75
>>>>
>>>>
>>>>
>>>> 371.8725
>>>>
>>>>
>>>>
>>>> lu.large
>>>>
>>>>
>>>>
>>>> 116.46
>>>>
>>>>
>>>>
>>>> 127.51
>>>>
>>>>
>>>>
>>>> 129.33
>>>>
>>>>
>>>>
>>>> 124.3007
>>>>
>>>>
>>>>
>>>> lu.large
>>>>
>>>>
>>>>
>>>> 124.2
>>>>
>>>>
>>>>
>>>> 122.69
>>>>
>>>>
>>>>
>>>> 124.1
>>>>
>>>>
>>>>
>>>> 123.6614
>>>>
>>>>
>>>>
>>>> transform
>>>>
>>>>
>>>>
>>>> 1056.64
>>>>
>>>>
>>>>
>>>> 1066.6
>>>>
>>>>
>>>>
>>>> 1021.08
>>>>
>>>>
>>>>
>>>> 1047.923
>>>>
>>>>
>>>>
>>>> transform
>>>>
>>>>
>>>>
>>>> 1015.85
>>>>
>>>>
>>>>
>>>> 1056.42
>>>>
>>>>
>>>>
>>>> 1049.42
>>>>
>>>>
>>>>
>>>> 1040.412
>>>>
>>>>
>>>>
>>>> validation
>>>>
>>>>
>>>>
>>>> 1371.86
>>>>
>>>>
>>>>
>>>> 1898.49
>>>>
>>>>
>>>>
>>>> 1971.28
>>>>
>>>>
>>>>
>>>> 1725.131
>>>>
>>>>
>>>>
>>>> validation
>>>>
>>>>
>>>>
>>>> 2088.81
>>>>
>>>>
>>>>
>>>> 2178.14
>>>>
>>>>
>>>>
>>>> 2112.95
>>>>
>>>>
>>>>
>>>> 2126.301
>>>>
>>>>
>>>>
>>>> aes
>>>>
>>>>
>>>>
>>>> 276.67
>>>>
>>>>
>>>>
>>>> 255.84
>>>>
>>>>
>>>>
>>>> 299.78
>>>>
>>>>
>>>>
>>>> 276.8499
>>>>
>>>>
>>>>
>>>> aes
>>>>
>>>>
>>>>
>>>> 261.5
>>>>
>>>>
>>>>
>>>> 258.95
>>>>
>>>>
>>>>
>>>> 290.17
>>>>
>>>>
>>>>
>>>> 269.8444
>>>>
>>>>
>>>>
>>>> rsa
>>>>
>>>>
>>>>
>>>> 1041.29
>>>>
>>>>
>>>>
>>>> 1069.51
>>>>
>>>>
>>>>
>>>> 1069.26
>>>>
>>>>
>>>>
>>>> 1059.937
>>>>
>>>>
>>>>
>>>> rsa
>>>>
>>>>
>>>>
>>>> 1091.45
>>>>
>>>>
>>>>
>>>> 1089.15
>>>>
>>>>
>>>>
>>>> 1095.52
>>>>
>>>>
>>>>
>>>> 1092.037
>>>>
>>>>
>>>>
>>>> signverify
>>>>
>>>>
>>>>
>>>> 2583.7
>>>>
>>>>
>>>>
>>>> 2592.98
>>>>
>>>>
>>>>
>>>> 2586.34
>>>>
>>>>
>>>>
>>>> 2587.67
>>>>
>>>>
>>>>
>>>> signverify
>>>>
>>>>
>>>>
>>>> 2660.73
>>>>
>>>>
>>>>
>>>> 2664.17
>>>>
>>>>
>>>>
>>>> 2634.47
>>>>
>>>>
>>>>
>>>> 2653.09
>>>>
>>>>
>>>>
>>>> compress
>>>>
>>>>
>>>>
>>>> 817.65
>>>>
>>>>
>>>>
>>>> 817.44
>>>>
>>>>
>>>>
>>>> 816.55
>>>>
>>>>
>>>>
>>>> 817.2132
>>>>
>>>>
>>>>
>>>> compress
>>>>
>>>>
>>>>
>>>> 852.55
>>>>
>>>>
>>>>
>>>> 847.61
>>>>
>>>>
>>>>
>>>> 894.59
>>>>
>>>>
>>>>
>>>> 864.6626
>>>>
>>>>
>>>>
>>>> serial
>>>>
>>>>
>>>>
>>>> 608.48
>>>>
>>>>
>>>>
>>>> 586.62
>>>>
>>>>
>>>>
>>>> 615.37
>>>>
>>>>
>>>>
>>>> 603.3646
>>>>
>>>>
>>>>
>>>> serial
>>>>
>>>>
>>>>
>>>> 627.19
>>>>
>>>>
>>>>
>>>> 605.21
>>>>
>>>>
>>>>
>>>> 619.31
>>>>
>>>>
>>>>
>>>> 617.1695
>>>>
>>>>
>>>>
>>>> sunflow
>>>>
>>>>
>>>>
>>>> 371.28
>>>>
>>>>
>>>>
>>>> 373.03
>>>>
>>>>
>>>>
>>>> 373.04
>>>>
>>>>
>>>>
>>>> 372.4491
>>>>
>>>>
>>>>
>>>> sunflow
>>>>
>>>>
>>>>
>>>> 368.59
>>>>
>>>>
>>>>
>>>> 381.78
>>>>
>>>>
>>>>
>>>> 369.64
>>>>
>>>>
>>>>
>>>> 373.289
>>>>
>>>>
>>>>
>>>> mpegaudio
>>>>
>>>>
>>>>
>>>> 743.85
>>>>
>>>>
>>>>
>>>> 734.46
>>>>
>>>>
>>>>
>>>> 752.62
>>>>
>>>>
>>>>
>>>> 743.6064
>>>>
>>>>
>>>>
>>>> mpegaudio
>>>>
>>>>
>>>>
>>>> 775.45
>>>>
>>>>
>>>>
>>>> 773.35
>>>>
>>>>
>>>>
>>>> 776.98
>>>>
>>>>
>>>>
>>>> 775.2586
>>>>
>>>>
>>>>
>>>> derby
>>>>
>>>>
>>>>
>>>> 1929.9
>>>>
>>>>
>>>>
>>>> 1901.28
>>>>
>>>>
>>>>
>>>> 1922.56
>>>>
>>>>
>>>>
>>>> 1917.875
>>>>
>>>>
>>>>
>>>> derby
>>>>
>>>>
>>>>
>>>> 1927.97
>>>>
>>>>
>>>>
>>>> 1865.47
>>>>
>>>>
>>>>
>>>> 1919.17
>>>>
>>>>
>>>>
>>>> 1904.002
>>>>
>>>>
>>>>
>>>> Total
>>>>
>>>>
>>>>
>>>> 780.54
>>>>
>>>>
>>>>
>>>> 779.91
>>>>
>>>>
>>>>
>>>> 786.98
>>>>
>>>>
>>>>
>>>> 782.4702
>>>>
>>>>
>>>>
>>>> Total
>>>>
>>>>
>>>>
>>>> 801
>>>>
>>>>
>>>>
>>>> 812.98
>>>>
>>>>
>>>>
>>>> 819
>>>>
>>>>
>>>>
>>>> 810.9587
>>>>
>>>>
>>>>
>>>> 3.65% improvement
>>>>
>>>> regards,
>>>>
>>>> Rahul
>>>>
More information about the hotspot-compiler-dev
mailing list