KNL specific fix: disable generating INC and DEC instructions on Xeon Phi and Silvermont CPUs
Vladimir Kozlov
vladimir.kozlov at oracle.com
Thu Jun 22 00:02:26 UTC 2017
Thank you, Rahul
Why you need to split code and not just add new cpu models checks at the second place?:
@@ -1179,7 +1179,9 @@
if ((cpu_family() == 0x06) &&
((extended_cpu_model() == 0x36) || // Centerton
(extended_cpu_model() == 0x37) || // Silvermont
- (extended_cpu_model() == 0x4D))) {
+ (extended_cpu_model() == 0x4D) ||
+ (extended_cpu_model() == 0x57) || // Xeon Phi 3200/5200/7200
+ (extended_cpu_model() == 0x85))) { // Future Xeon Phi
#ifdef COMPILER2
if (FLAG_IS_DEFAULT(OptoScheduling)) {
OptoScheduling = true;
@@ -1190,6 +1192,9 @@
UseUnalignedLoadStores = true; // use movdqu on newest Intel cpus
}
}
+ if (FLAG_IS_DEFAULT(UseIncDec)) {
+ FLAG_SET_DEFAULT(UseIncDec, false);
+ }
}
if(FLAG_IS_DEFAULT(AllocatePrefetchInstr) && supports_3dnow_prefetch()) {
Thanks,
Vladimir
On 6/21/17 3:39 PM, Kandu, Rahul wrote:
> Hi Vladimir,
>
> Webrev for the code change.. after correcting auto indent parameters as specified.
>
> http://cr.openjdk.java.net/~vdeshpande/8182138/webrev.02/
> Openjdk bug: https://bugs.openjdk.java.net/browse/JDK-8182138
>
>
> --- old/src/cpu/x86/vm/vm_version_x86.cpp 2017-06-21 14:57:28.002941500 -0700
> +++ new/src/cpu/x86/vm/vm_version_x86.cpp 2017-06-21 14:57:27.660400400 -0700
> @@ -654,6 +654,19 @@
> ((extended_cpu_model() == 0x57) || // Xeon Phi 3200/5200/7200
> (extended_cpu_model() == 0x85))) { // Future Xeon Phi
> _features &= ~CPU_VZEROUPPER;
> + if (FLAG_IS_DEFAULT(UseIncDec)) {
> + FLAG_SET_DEFAULT(UseIncDec, false);
> + }
> +#ifdef COMPILER2
> + if (FLAG_IS_DEFAULT(OptoScheduling)) {
> + OptoScheduling = true;
> + }
> +#endif
> + if (supports_sse4_2()) { // Silvermont
> + if (FLAG_IS_DEFAULT(UseUnalignedLoadStores)) {
> + UseUnalignedLoadStores = true; // use movdqu on newest Intel cpus
> + }
> + }
> }
> }
>
> @@ -1193,6 +1206,9 @@
> UseUnalignedLoadStores = true; // use movdqu on newest Intel cpus
> }
> }
> + if (FLAG_IS_DEFAULT(UseIncDec)) {
> + FLAG_SET_DEFAULT(UseIncDec, false);
> + }
> }
>
> regards,
> Rahul
>
>
>
>
> -----Original Message-----
> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com]
> Sent: Friday, June 16, 2017 2:29 PM
> To: Kandu, Rahul <rahul.kandu at intel.com>; hotspot-compiler-dev at openjdk.java.net
> Subject: Re: KNL specific fix: disable generating INC and DEC instructions on Xeon Phi and Silvermont CPUs
>
> I don't see it is fixed:
>
> + FLAG_SET_DEFAULT(UseIncDec, false);
> + }
> +#ifdef COMPILER2
> + if (FLAG_IS_DEFAULT(OptoScheduling)) {
> + OptoScheduling = true;
> + }
> +#endif
> + if (supports_sse4_2()) { // Silvermont
>
> + if (FLAG_IS_DEFAULT(UseIncDec)){
> + FLAG_SET_DEFAULT(UseIncDec, false);
> + }
>
> Vladimir
>
> On 6/16/17 2:03 PM, Kandu, Rahul wrote:
>> Hi Vladimir,
>>
>> Thanks. Fixed the indents- no tabs in the code change. Please find the
>> updated webrev below.
>>
>> Openjdk bug location: https://bugs.openjdk.java.net/browse/JDK-8182138
>>
>> Webrev for the code change:
>> http://cr.openjdk.java.net/~vdeshpande/8182138/webrev.01/
>>
>> regards,
>>
>> Rahul
>>
>> *From:*Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com]
>> *Sent:* Thursday, June 15, 2017 2:10 PM
>> *To:* Kandu, Rahul <rahul.kandu at intel.com>;
>> hotspot-compiler-dev at openjdk.java.net
>> *Subject:* Re: KNL specific fix: disable generating INC and DEC
>> instructions on Xeon Phi and Silvermont CPUs
>>
>> Hi Rahul
>>
>> Please fix indents - don't use tabs.
>>
>> Vladimir
>>
>> On 6/15/17 1:14 PM, Kandu, Rahul wrote:
>>
>> Hi all,
>>
>> The following patch disables generating INC, DEC instructions on
>> Xeon Phi and Silvermont ATOM based CPUs. We have currently
>> identified that using INC and DEC can suffer from unexpected
>> performance drops on certain processors which don't optimize for
>> partial write flags. This patch disables generation of these two
>> instructions as they are more commonly used at loop
>> increment/decrement.
>>
>> Patch provides 3.65% better performance on Knights Landing CPU on
>> SPECjvm2008 composite score as per runs below on the latest openjdk
>> source.
>>
>> Openjdk bug location:
>> https://bugs.openjdk.java.net/browse/JDK-8182138
>>
>> Webrev for the code change:
>> http://cr.openjdk.java.net/~vdeshpande/8182138/webrev.00/
>> <http://cr.openjdk.java.net/%7Evdeshpande/8182138/webrev.00/>
>>
>> Scores:
>>
>>
>>
>> *6/10 jdk10 code (no change)*
>>
>>
>>
>> *6/10 jdk10code with this patch *
>>
>>
>>
>>
>> *run1*
>>
>>
>>
>> *run2*
>>
>>
>>
>> *run3*
>>
>>
>>
>> *geomean*
>>
>>
>>
>> *run1*
>>
>>
>>
>> *run2*
>>
>>
>>
>> *run3*
>>
>>
>>
>> *geomean*
>>
>>
>>
>> lu.small
>>
>>
>>
>> 1503.79
>>
>>
>>
>> 1500.62
>>
>>
>>
>> 1494.98
>>
>>
>>
>> 1499.792
>>
>>
>>
>> lu.small
>>
>>
>>
>> 1478.48
>>
>>
>>
>> 1493.78
>>
>>
>>
>> 1509.2
>>
>>
>>
>> 1493.767
>>
>>
>>
>> sor.small
>>
>>
>>
>> 2417.24
>>
>>
>>
>> 2372.1
>>
>>
>>
>> 2356.47
>>
>>
>>
>> 2381.798
>>
>>
>>
>> sor.small
>>
>>
>>
>> 2436.89
>>
>>
>>
>> 2434.46
>>
>>
>>
>> 2446.88
>>
>>
>>
>> 2439.404
>>
>>
>>
>> sparse.small
>>
>>
>>
>> 606.35
>>
>>
>>
>> 635.19
>>
>>
>>
>> 595.44
>>
>>
>>
>> 612.099
>>
>>
>>
>> sparse.small
>>
>>
>>
>> 681.96
>>
>>
>>
>> 728.02
>>
>>
>>
>> 671.41
>>
>>
>>
>> 693.3673
>>
>>
>>
>> fft.small
>>
>>
>>
>> 1463.55
>>
>>
>>
>> 1406.43
>>
>>
>>
>> 1173.63
>>
>>
>>
>> 1341.793
>>
>>
>>
>> fft.small
>>
>>
>>
>> 1220.14
>>
>>
>>
>> 1425.19
>>
>>
>>
>> 1190.06
>>
>>
>>
>> 1274.335
>>
>>
>>
>> monte_carlo
>>
>>
>>
>> 823.66
>>
>>
>>
>> 825.96
>>
>>
>>
>> 761.26
>>
>>
>>
>> 803.0575
>>
>>
>>
>> monte_carlo
>>
>>
>>
>> 939.53
>>
>>
>>
>> 923
>>
>>
>>
>> 934.76
>>
>>
>>
>> 932.4041
>>
>>
>>
>> sparse.large
>>
>>
>>
>> 159.45
>>
>>
>>
>> 139.83
>>
>>
>>
>> 155.76
>>
>>
>>
>> 151.4352
>>
>>
>>
>> sparse.large
>>
>>
>>
>> 100.66
>>
>>
>>
>> 150.22
>>
>>
>>
>> 179.79
>>
>>
>>
>> 139.5672
>>
>>
>>
>> fft.large
>>
>>
>>
>> 419.19
>>
>>
>>
>> 425.81
>>
>>
>>
>> 432.6
>>
>>
>>
>> 425.8315
>>
>>
>>
>> fft.large
>>
>>
>>
>> 433.11
>>
>>
>>
>> 424.72
>>
>>
>>
>> 429.07
>>
>>
>>
>> 428.953
>>
>>
>>
>> sor.large
>>
>>
>>
>> 416.31
>>
>>
>>
>> 262.98
>>
>>
>>
>> 271.31
>>
>>
>>
>> 309.6957
>>
>>
>>
>> sor.large
>>
>>
>>
>> 366.6
>>
>>
>>
>> 397.67
>>
>>
>>
>> 352.75
>>
>>
>>
>> 371.8725
>>
>>
>>
>> lu.large
>>
>>
>>
>> 116.46
>>
>>
>>
>> 127.51
>>
>>
>>
>> 129.33
>>
>>
>>
>> 124.3007
>>
>>
>>
>> lu.large
>>
>>
>>
>> 124.2
>>
>>
>>
>> 122.69
>>
>>
>>
>> 124.1
>>
>>
>>
>> 123.6614
>>
>>
>>
>> transform
>>
>>
>>
>> 1056.64
>>
>>
>>
>> 1066.6
>>
>>
>>
>> 1021.08
>>
>>
>>
>> 1047.923
>>
>>
>>
>> transform
>>
>>
>>
>> 1015.85
>>
>>
>>
>> 1056.42
>>
>>
>>
>> 1049.42
>>
>>
>>
>> 1040.412
>>
>>
>>
>> validation
>>
>>
>>
>> 1371.86
>>
>>
>>
>> 1898.49
>>
>>
>>
>> 1971.28
>>
>>
>>
>> 1725.131
>>
>>
>>
>> validation
>>
>>
>>
>> 2088.81
>>
>>
>>
>> 2178.14
>>
>>
>>
>> 2112.95
>>
>>
>>
>> 2126.301
>>
>>
>>
>> aes
>>
>>
>>
>> 276.67
>>
>>
>>
>> 255.84
>>
>>
>>
>> 299.78
>>
>>
>>
>> 276.8499
>>
>>
>>
>> aes
>>
>>
>>
>> 261.5
>>
>>
>>
>> 258.95
>>
>>
>>
>> 290.17
>>
>>
>>
>> 269.8444
>>
>>
>>
>> rsa
>>
>>
>>
>> 1041.29
>>
>>
>>
>> 1069.51
>>
>>
>>
>> 1069.26
>>
>>
>>
>> 1059.937
>>
>>
>>
>> rsa
>>
>>
>>
>> 1091.45
>>
>>
>>
>> 1089.15
>>
>>
>>
>> 1095.52
>>
>>
>>
>> 1092.037
>>
>>
>>
>> signverify
>>
>>
>>
>> 2583.7
>>
>>
>>
>> 2592.98
>>
>>
>>
>> 2586.34
>>
>>
>>
>> 2587.67
>>
>>
>>
>> signverify
>>
>>
>>
>> 2660.73
>>
>>
>>
>> 2664.17
>>
>>
>>
>> 2634.47
>>
>>
>>
>> 2653.09
>>
>>
>>
>> compress
>>
>>
>>
>> 817.65
>>
>>
>>
>> 817.44
>>
>>
>>
>> 816.55
>>
>>
>>
>> 817.2132
>>
>>
>>
>> compress
>>
>>
>>
>> 852.55
>>
>>
>>
>> 847.61
>>
>>
>>
>> 894.59
>>
>>
>>
>> 864.6626
>>
>>
>>
>> serial
>>
>>
>>
>> 608.48
>>
>>
>>
>> 586.62
>>
>>
>>
>> 615.37
>>
>>
>>
>> 603.3646
>>
>>
>>
>> serial
>>
>>
>>
>> 627.19
>>
>>
>>
>> 605.21
>>
>>
>>
>> 619.31
>>
>>
>>
>> 617.1695
>>
>>
>>
>> sunflow
>>
>>
>>
>> 371.28
>>
>>
>>
>> 373.03
>>
>>
>>
>> 373.04
>>
>>
>>
>> 372.4491
>>
>>
>>
>> sunflow
>>
>>
>>
>> 368.59
>>
>>
>>
>> 381.78
>>
>>
>>
>> 369.64
>>
>>
>>
>> 373.289
>>
>>
>>
>> mpegaudio
>>
>>
>>
>> 743.85
>>
>>
>>
>> 734.46
>>
>>
>>
>> 752.62
>>
>>
>>
>> 743.6064
>>
>>
>>
>> mpegaudio
>>
>>
>>
>> 775.45
>>
>>
>>
>> 773.35
>>
>>
>>
>> 776.98
>>
>>
>>
>> 775.2586
>>
>>
>>
>> derby
>>
>>
>>
>> 1929.9
>>
>>
>>
>> 1901.28
>>
>>
>>
>> 1922.56
>>
>>
>>
>> 1917.875
>>
>>
>>
>> derby
>>
>>
>>
>> 1927.97
>>
>>
>>
>> 1865.47
>>
>>
>>
>> 1919.17
>>
>>
>>
>> 1904.002
>>
>>
>>
>> Total
>>
>>
>>
>> 780.54
>>
>>
>>
>> 779.91
>>
>>
>>
>> 786.98
>>
>>
>>
>> 782.4702
>>
>>
>>
>> Total
>>
>>
>>
>> 801
>>
>>
>>
>> 812.98
>>
>>
>>
>> 819
>>
>>
>>
>> 810.9587
>>
>>
>>
>> 3.65% improvement
>>
>> regards,
>>
>> Rahul
>>
More information about the hotspot-compiler-dev
mailing list