KNL specific fix: disable generating INC and DEC instructions on Xeon Phi and Silvermont CPUs
Vladimir Kozlov
vladimir.kozlov at oracle.com
Fri Jun 16 21:28:46 UTC 2017
I don't see it is fixed:
+ FLAG_SET_DEFAULT(UseIncDec, false);
+ }
+#ifdef COMPILER2
+ if (FLAG_IS_DEFAULT(OptoScheduling)) {
+ OptoScheduling = true;
+ }
+#endif
+ if (supports_sse4_2()) { // Silvermont
+ if (FLAG_IS_DEFAULT(UseIncDec)){
+ FLAG_SET_DEFAULT(UseIncDec, false);
+ }
Vladimir
On 6/16/17 2:03 PM, Kandu, Rahul wrote:
> Hi Vladimir,
>
> Thanks. Fixed the indents- no tabs in the code change. Please find the
> updated webrev below.
>
> Openjdk bug location: https://bugs.openjdk.java.net/browse/JDK-8182138
>
> Webrev for the code change:
> http://cr.openjdk.java.net/~vdeshpande/8182138/webrev.01/
>
> regards,
>
> Rahul
>
> *From:*Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com]
> *Sent:* Thursday, June 15, 2017 2:10 PM
> *To:* Kandu, Rahul <rahul.kandu at intel.com>;
> hotspot-compiler-dev at openjdk.java.net
> *Subject:* Re: KNL specific fix: disable generating INC and DEC
> instructions on Xeon Phi and Silvermont CPUs
>
> Hi Rahul
>
> Please fix indents - don't use tabs.
>
> Vladimir
>
> On 6/15/17 1:14 PM, Kandu, Rahul wrote:
>
> Hi all,
>
> The following patch disables generating INC, DEC instructions on
> Xeon Phi and Silvermont ATOM based CPUs. We have currently
> identified that using INC and DEC can suffer from unexpected
> performance drops on certain processors which don't optimize for
> partial write flags. This patch disables generation of these two
> instructions as they are more commonly used at loop
> increment/decrement.
>
> Patch provides 3.65% better performance on Knights Landing CPU on
> SPECjvm2008 composite score as per runs below on the latest openjdk
> source.
>
> Openjdk bug location: https://bugs.openjdk.java.net/browse/JDK-8182138
>
> Webrev for the code change:
> http://cr.openjdk.java.net/~vdeshpande/8182138/webrev.00/
> <http://cr.openjdk.java.net/%7Evdeshpande/8182138/webrev.00/>
>
> Scores:
>
>
>
> *6/10 jdk10 code (no change)*
>
>
>
> *6/10 jdk10code with this patch *
>
>
>
>
> *run1*
>
>
>
> *run2*
>
>
>
> *run3*
>
>
>
> *geomean*
>
>
>
> *run1*
>
>
>
> *run2*
>
>
>
> *run3*
>
>
>
> *geomean*
>
>
>
> lu.small
>
>
>
> 1503.79
>
>
>
> 1500.62
>
>
>
> 1494.98
>
>
>
> 1499.792
>
>
>
> lu.small
>
>
>
> 1478.48
>
>
>
> 1493.78
>
>
>
> 1509.2
>
>
>
> 1493.767
>
>
>
> sor.small
>
>
>
> 2417.24
>
>
>
> 2372.1
>
>
>
> 2356.47
>
>
>
> 2381.798
>
>
>
> sor.small
>
>
>
> 2436.89
>
>
>
> 2434.46
>
>
>
> 2446.88
>
>
>
> 2439.404
>
>
>
> sparse.small
>
>
>
> 606.35
>
>
>
> 635.19
>
>
>
> 595.44
>
>
>
> 612.099
>
>
>
> sparse.small
>
>
>
> 681.96
>
>
>
> 728.02
>
>
>
> 671.41
>
>
>
> 693.3673
>
>
>
> fft.small
>
>
>
> 1463.55
>
>
>
> 1406.43
>
>
>
> 1173.63
>
>
>
> 1341.793
>
>
>
> fft.small
>
>
>
> 1220.14
>
>
>
> 1425.19
>
>
>
> 1190.06
>
>
>
> 1274.335
>
>
>
> monte_carlo
>
>
>
> 823.66
>
>
>
> 825.96
>
>
>
> 761.26
>
>
>
> 803.0575
>
>
>
> monte_carlo
>
>
>
> 939.53
>
>
>
> 923
>
>
>
> 934.76
>
>
>
> 932.4041
>
>
>
> sparse.large
>
>
>
> 159.45
>
>
>
> 139.83
>
>
>
> 155.76
>
>
>
> 151.4352
>
>
>
> sparse.large
>
>
>
> 100.66
>
>
>
> 150.22
>
>
>
> 179.79
>
>
>
> 139.5672
>
>
>
> fft.large
>
>
>
> 419.19
>
>
>
> 425.81
>
>
>
> 432.6
>
>
>
> 425.8315
>
>
>
> fft.large
>
>
>
> 433.11
>
>
>
> 424.72
>
>
>
> 429.07
>
>
>
> 428.953
>
>
>
> sor.large
>
>
>
> 416.31
>
>
>
> 262.98
>
>
>
> 271.31
>
>
>
> 309.6957
>
>
>
> sor.large
>
>
>
> 366.6
>
>
>
> 397.67
>
>
>
> 352.75
>
>
>
> 371.8725
>
>
>
> lu.large
>
>
>
> 116.46
>
>
>
> 127.51
>
>
>
> 129.33
>
>
>
> 124.3007
>
>
>
> lu.large
>
>
>
> 124.2
>
>
>
> 122.69
>
>
>
> 124.1
>
>
>
> 123.6614
>
>
>
> transform
>
>
>
> 1056.64
>
>
>
> 1066.6
>
>
>
> 1021.08
>
>
>
> 1047.923
>
>
>
> transform
>
>
>
> 1015.85
>
>
>
> 1056.42
>
>
>
> 1049.42
>
>
>
> 1040.412
>
>
>
> validation
>
>
>
> 1371.86
>
>
>
> 1898.49
>
>
>
> 1971.28
>
>
>
> 1725.131
>
>
>
> validation
>
>
>
> 2088.81
>
>
>
> 2178.14
>
>
>
> 2112.95
>
>
>
> 2126.301
>
>
>
> aes
>
>
>
> 276.67
>
>
>
> 255.84
>
>
>
> 299.78
>
>
>
> 276.8499
>
>
>
> aes
>
>
>
> 261.5
>
>
>
> 258.95
>
>
>
> 290.17
>
>
>
> 269.8444
>
>
>
> rsa
>
>
>
> 1041.29
>
>
>
> 1069.51
>
>
>
> 1069.26
>
>
>
> 1059.937
>
>
>
> rsa
>
>
>
> 1091.45
>
>
>
> 1089.15
>
>
>
> 1095.52
>
>
>
> 1092.037
>
>
>
> signverify
>
>
>
> 2583.7
>
>
>
> 2592.98
>
>
>
> 2586.34
>
>
>
> 2587.67
>
>
>
> signverify
>
>
>
> 2660.73
>
>
>
> 2664.17
>
>
>
> 2634.47
>
>
>
> 2653.09
>
>
>
> compress
>
>
>
> 817.65
>
>
>
> 817.44
>
>
>
> 816.55
>
>
>
> 817.2132
>
>
>
> compress
>
>
>
> 852.55
>
>
>
> 847.61
>
>
>
> 894.59
>
>
>
> 864.6626
>
>
>
> serial
>
>
>
> 608.48
>
>
>
> 586.62
>
>
>
> 615.37
>
>
>
> 603.3646
>
>
>
> serial
>
>
>
> 627.19
>
>
>
> 605.21
>
>
>
> 619.31
>
>
>
> 617.1695
>
>
>
> sunflow
>
>
>
> 371.28
>
>
>
> 373.03
>
>
>
> 373.04
>
>
>
> 372.4491
>
>
>
> sunflow
>
>
>
> 368.59
>
>
>
> 381.78
>
>
>
> 369.64
>
>
>
> 373.289
>
>
>
> mpegaudio
>
>
>
> 743.85
>
>
>
> 734.46
>
>
>
> 752.62
>
>
>
> 743.6064
>
>
>
> mpegaudio
>
>
>
> 775.45
>
>
>
> 773.35
>
>
>
> 776.98
>
>
>
> 775.2586
>
>
>
> derby
>
>
>
> 1929.9
>
>
>
> 1901.28
>
>
>
> 1922.56
>
>
>
> 1917.875
>
>
>
> derby
>
>
>
> 1927.97
>
>
>
> 1865.47
>
>
>
> 1919.17
>
>
>
> 1904.002
>
>
>
> Total
>
>
>
> 780.54
>
>
>
> 779.91
>
>
>
> 786.98
>
>
>
> 782.4702
>
>
>
> Total
>
>
>
> 801
>
>
>
> 812.98
>
>
>
> 819
>
>
>
> 810.9587
>
>
>
> 3.65% improvement
>
> regards,
>
> Rahul
>
More information about the hotspot-compiler-dev
mailing list