KNL specific fix: disable generating INC and DEC instructions on Xeon Phi and Silvermont CPUs

Vladimir Kozlov vladimir.kozlov at oracle.com
Thu Jun 22 00:02:26 UTC 2017


Thank you, Rahul

Why you need to split code and not just add new cpu models checks at the second place?:

@@ -1179,7 +1179,9 @@
      if ((cpu_family() == 0x06) &&
          ((extended_cpu_model() == 0x36) || // Centerton
           (extended_cpu_model() == 0x37) || // Silvermont
-         (extended_cpu_model() == 0x4D))) {
+         (extended_cpu_model() == 0x4D) ||
+         (extended_cpu_model() == 0x57) ||   // Xeon Phi 3200/5200/7200
+         (extended_cpu_model() == 0x85))) {  // Future Xeon Phi
  #ifdef COMPILER2
        if (FLAG_IS_DEFAULT(OptoScheduling)) {
          OptoScheduling = true;
@@ -1190,6 +1192,9 @@
            UseUnalignedLoadStores = true; // use movdqu on newest Intel cpus
          }
        }
+      if (FLAG_IS_DEFAULT(UseIncDec)) {
+        FLAG_SET_DEFAULT(UseIncDec, false);
+      }
      }
      if(FLAG_IS_DEFAULT(AllocatePrefetchInstr) && supports_3dnow_prefetch()) {

Thanks,
Vladimir

On 6/21/17 3:39 PM, Kandu, Rahul wrote:
> Hi Vladimir,
> 
> Webrev for the code change.. after correcting auto indent parameters as specified.
> 
> http://cr.openjdk.java.net/~vdeshpande/8182138/webrev.02/
> Openjdk bug: https://bugs.openjdk.java.net/browse/JDK-8182138
> 
> 
> --- old/src/cpu/x86/vm/vm_version_x86.cpp	2017-06-21 14:57:28.002941500 -0700
> +++ new/src/cpu/x86/vm/vm_version_x86.cpp	2017-06-21 14:57:27.660400400 -0700
> @@ -654,6 +654,19 @@
>           ((extended_cpu_model() == 0x57) ||   // Xeon Phi 3200/5200/7200
>           (extended_cpu_model() == 0x85))) {  // Future Xeon Phi
>         _features &= ~CPU_VZEROUPPER;
> +      if (FLAG_IS_DEFAULT(UseIncDec)) {
> +        FLAG_SET_DEFAULT(UseIncDec, false);
> +      }
> +#ifdef COMPILER2
> +      if (FLAG_IS_DEFAULT(OptoScheduling)) {
> +        OptoScheduling = true;
> +      }
> +#endif
> +      if (supports_sse4_2()) { // Silvermont
> +        if (FLAG_IS_DEFAULT(UseUnalignedLoadStores)) {
> +          UseUnalignedLoadStores = true; // use movdqu on newest Intel cpus
> +        }
> +      }
>       }
>     }
>   
> @@ -1193,6 +1206,9 @@
>             UseUnalignedLoadStores = true; // use movdqu on newest Intel cpus
>           }
>         }
> +      if (FLAG_IS_DEFAULT(UseIncDec)) {
> +        FLAG_SET_DEFAULT(UseIncDec, false);
> +      }
>       }
> 
> regards,
> Rahul
> 
> 
> 
> 
> -----Original Message-----
> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com]
> Sent: Friday, June 16, 2017 2:29 PM
> To: Kandu, Rahul <rahul.kandu at intel.com>; hotspot-compiler-dev at openjdk.java.net
> Subject: Re: KNL specific fix: disable generating INC and DEC instructions on Xeon Phi and Silvermont CPUs
> 
> I don't see it is fixed:
> 
> +       FLAG_SET_DEFAULT(UseIncDec, false);
> +       }
> +#ifdef COMPILER2
> + if (FLAG_IS_DEFAULT(OptoScheduling)) {
> +  OptoScheduling = true;
> + }
> +#endif
> +      if (supports_sse4_2()) { // Silvermont
> 
> +       if (FLAG_IS_DEFAULT(UseIncDec)){
> +        FLAG_SET_DEFAULT(UseIncDec, false);
> +        }
> 
> Vladimir
> 
> On 6/16/17 2:03 PM, Kandu, Rahul wrote:
>> Hi Vladimir,
>>
>> Thanks. Fixed the indents- no tabs in the code change. Please find the
>> updated webrev below.
>>
>> Openjdk bug location: https://bugs.openjdk.java.net/browse/JDK-8182138
>>
>> Webrev for the code change:
>> http://cr.openjdk.java.net/~vdeshpande/8182138/webrev.01/
>>
>> regards,
>>
>> Rahul
>>
>> *From:*Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com]
>> *Sent:* Thursday, June 15, 2017 2:10 PM
>> *To:* Kandu, Rahul <rahul.kandu at intel.com>;
>> hotspot-compiler-dev at openjdk.java.net
>> *Subject:* Re: KNL specific fix: disable generating INC and DEC
>> instructions on Xeon Phi and Silvermont CPUs
>>
>> Hi Rahul
>>
>> Please fix indents - don't use tabs.
>>
>> Vladimir
>>
>> On 6/15/17 1:14 PM, Kandu, Rahul wrote:
>>
>>      Hi all,
>>
>>      The following patch disables generating INC, DEC instructions on
>>      Xeon Phi and Silvermont ATOM based CPUs. We have currently
>>      identified that using INC and DEC can suffer from unexpected
>>      performance drops on certain processors which don't optimize for
>>      partial write flags. This patch disables generation of these two
>>      instructions as they are more commonly used at loop
>>      increment/decrement.
>>
>>      Patch provides 3.65% better performance on Knights Landing CPU on
>>      SPECjvm2008 composite score as per runs below on the latest openjdk
>>      source.
>>
>>      Openjdk bug location:
>> https://bugs.openjdk.java.net/browse/JDK-8182138
>>
>>      Webrev for the code change:
>>      http://cr.openjdk.java.net/~vdeshpande/8182138/webrev.00/
>>      <http://cr.openjdk.java.net/%7Evdeshpande/8182138/webrev.00/>
>>
>>      Scores:
>>
>>      	
>>
>>      *6/10 jdk10 code (no change)*
>>
>>      						
>>
>>      *6/10 jdk10code with this patch *
>>
>>      				
>>      	
>>
>>      *run1*
>>
>>      	
>>
>>      *run2*
>>
>>      	
>>
>>      *run3*
>>
>>      	
>>
>>      *geomean*
>>
>>      						
>>
>>      *run1*
>>
>>      	
>>
>>      *run2*
>>
>>      	
>>
>>      *run3*
>>
>>      	
>>
>>      *geomean*
>>
>>      		
>>
>>      lu.small
>>
>>      	
>>
>>      1503.79
>>
>>      	
>>
>>      1500.62
>>
>>      	
>>
>>      1494.98
>>
>>      	
>>
>>      1499.792
>>
>>      					
>>
>>      lu.small
>>
>>      	
>>
>>      1478.48
>>
>>      	
>>
>>      1493.78
>>
>>      	
>>
>>      1509.2
>>
>>      	
>>
>>      1493.767
>>
>>      		
>>
>>      sor.small
>>
>>      	
>>
>>      2417.24
>>
>>      	
>>
>>      2372.1
>>
>>      	
>>
>>      2356.47
>>
>>      	
>>
>>      2381.798
>>
>>      					
>>
>>      sor.small
>>
>>      	
>>
>>      2436.89
>>
>>      	
>>
>>      2434.46
>>
>>      	
>>
>>      2446.88
>>
>>      	
>>
>>      2439.404
>>
>>      		
>>
>>      sparse.small
>>
>>      	
>>
>>      606.35
>>
>>      	
>>
>>      635.19
>>
>>      	
>>
>>      595.44
>>
>>      	
>>
>>      612.099
>>
>>      					
>>
>>      sparse.small
>>
>>      	
>>
>>      681.96
>>
>>      	
>>
>>      728.02
>>
>>      	
>>
>>      671.41
>>
>>      	
>>
>>      693.3673
>>
>>      		
>>
>>      fft.small
>>
>>      	
>>
>>      1463.55
>>
>>      	
>>
>>      1406.43
>>
>>      	
>>
>>      1173.63
>>
>>      	
>>
>>      1341.793
>>
>>      					
>>
>>      fft.small
>>
>>      	
>>
>>      1220.14
>>
>>      	
>>
>>      1425.19
>>
>>      	
>>
>>      1190.06
>>
>>      	
>>
>>      1274.335
>>
>>      		
>>
>>      monte_carlo
>>
>>      	
>>
>>      823.66
>>
>>      	
>>
>>      825.96
>>
>>      	
>>
>>      761.26
>>
>>      	
>>
>>      803.0575
>>
>>      					
>>
>>      monte_carlo
>>
>>      	
>>
>>      939.53
>>
>>      	
>>
>>      923
>>
>>      	
>>
>>      934.76
>>
>>      	
>>
>>      932.4041
>>
>>      		
>>
>>      sparse.large
>>
>>      	
>>
>>      159.45
>>
>>      	
>>
>>      139.83
>>
>>      	
>>
>>      155.76
>>
>>      	
>>
>>      151.4352
>>
>>      					
>>
>>      sparse.large
>>
>>      	
>>
>>      100.66
>>
>>      	
>>
>>      150.22
>>
>>      	
>>
>>      179.79
>>
>>      	
>>
>>      139.5672
>>
>>      		
>>
>>      fft.large
>>
>>      	
>>
>>      419.19
>>
>>      	
>>
>>      425.81
>>
>>      	
>>
>>      432.6
>>
>>      	
>>
>>      425.8315
>>
>>      					
>>
>>      fft.large
>>
>>      	
>>
>>      433.11
>>
>>      	
>>
>>      424.72
>>
>>      	
>>
>>      429.07
>>
>>      	
>>
>>      428.953
>>
>>      		
>>
>>      sor.large
>>
>>      	
>>
>>      416.31
>>
>>      	
>>
>>      262.98
>>
>>      	
>>
>>      271.31
>>
>>      	
>>
>>      309.6957
>>
>>      					
>>
>>      sor.large
>>
>>      	
>>
>>      366.6
>>
>>      	
>>
>>      397.67
>>
>>      	
>>
>>      352.75
>>
>>      	
>>
>>      371.8725
>>
>>      		
>>
>>      lu.large
>>
>>      	
>>
>>      116.46
>>
>>      	
>>
>>      127.51
>>
>>      	
>>
>>      129.33
>>
>>      	
>>
>>      124.3007
>>
>>      					
>>
>>      lu.large
>>
>>      	
>>
>>      124.2
>>
>>      	
>>
>>      122.69
>>
>>      	
>>
>>      124.1
>>
>>      	
>>
>>      123.6614
>>
>>      		
>>
>>      transform
>>
>>      	
>>
>>      1056.64
>>
>>      	
>>
>>      1066.6
>>
>>      	
>>
>>      1021.08
>>
>>      	
>>
>>      1047.923
>>
>>      					
>>
>>      transform
>>
>>      	
>>
>>      1015.85
>>
>>      	
>>
>>      1056.42
>>
>>      	
>>
>>      1049.42
>>
>>      	
>>
>>      1040.412
>>
>>      		
>>
>>      validation
>>
>>      	
>>
>>      1371.86
>>
>>      	
>>
>>      1898.49
>>
>>      	
>>
>>      1971.28
>>
>>      	
>>
>>      1725.131
>>
>>      					
>>
>>      validation
>>
>>      	
>>
>>      2088.81
>>
>>      	
>>
>>      2178.14
>>
>>      	
>>
>>      2112.95
>>
>>      	
>>
>>      2126.301
>>
>>      		
>>
>>      aes
>>
>>      	
>>
>>      276.67
>>
>>      	
>>
>>      255.84
>>
>>      	
>>
>>      299.78
>>
>>      	
>>
>>      276.8499
>>
>>      					
>>
>>      aes
>>
>>      	
>>
>>      261.5
>>
>>      	
>>
>>      258.95
>>
>>      	
>>
>>      290.17
>>
>>      	
>>
>>      269.8444
>>
>>      		
>>
>>      rsa
>>
>>      	
>>
>>      1041.29
>>
>>      	
>>
>>      1069.51
>>
>>      	
>>
>>      1069.26
>>
>>      	
>>
>>      1059.937
>>
>>      					
>>
>>      rsa
>>
>>      	
>>
>>      1091.45
>>
>>      	
>>
>>      1089.15
>>
>>      	
>>
>>      1095.52
>>
>>      	
>>
>>      1092.037
>>
>>      		
>>
>>      signverify
>>
>>      	
>>
>>      2583.7
>>
>>      	
>>
>>      2592.98
>>
>>      	
>>
>>      2586.34
>>
>>      	
>>
>>      2587.67
>>
>>      					
>>
>>      signverify
>>
>>      	
>>
>>      2660.73
>>
>>      	
>>
>>      2664.17
>>
>>      	
>>
>>      2634.47
>>
>>      	
>>
>>      2653.09
>>
>>      		
>>
>>      compress
>>
>>      	
>>
>>      817.65
>>
>>      	
>>
>>      817.44
>>
>>      	
>>
>>      816.55
>>
>>      	
>>
>>      817.2132
>>
>>      					
>>
>>      compress
>>
>>      	
>>
>>      852.55
>>
>>      	
>>
>>      847.61
>>
>>      	
>>
>>      894.59
>>
>>      	
>>
>>      864.6626
>>
>>      		
>>
>>      serial
>>
>>      	
>>
>>      608.48
>>
>>      	
>>
>>      586.62
>>
>>      	
>>
>>      615.37
>>
>>      	
>>
>>      603.3646
>>
>>      					
>>
>>      serial
>>
>>      	
>>
>>      627.19
>>
>>      	
>>
>>      605.21
>>
>>      	
>>
>>      619.31
>>
>>      	
>>
>>      617.1695
>>
>>      		
>>
>>      sunflow
>>
>>      	
>>
>>      371.28
>>
>>      	
>>
>>      373.03
>>
>>      	
>>
>>      373.04
>>
>>      	
>>
>>      372.4491
>>
>>      					
>>
>>      sunflow
>>
>>      	
>>
>>      368.59
>>
>>      	
>>
>>      381.78
>>
>>      	
>>
>>      369.64
>>
>>      	
>>
>>      373.289
>>
>>      		
>>
>>      mpegaudio
>>
>>      	
>>
>>      743.85
>>
>>      	
>>
>>      734.46
>>
>>      	
>>
>>      752.62
>>
>>      	
>>
>>      743.6064
>>
>>      					
>>
>>      mpegaudio
>>
>>      	
>>
>>      775.45
>>
>>      	
>>
>>      773.35
>>
>>      	
>>
>>      776.98
>>
>>      	
>>
>>      775.2586
>>
>>      		
>>
>>      derby
>>
>>      	
>>
>>      1929.9
>>
>>      	
>>
>>      1901.28
>>
>>      	
>>
>>      1922.56
>>
>>      	
>>
>>      1917.875
>>
>>      					
>>
>>      derby
>>
>>      	
>>
>>      1927.97
>>
>>      	
>>
>>      1865.47
>>
>>      	
>>
>>      1919.17
>>
>>      	
>>
>>      1904.002
>>
>>      		
>>
>>      Total
>>
>>      	
>>
>>      780.54
>>
>>      	
>>
>>      779.91
>>
>>      	
>>
>>      786.98
>>
>>      	
>>
>>      782.4702
>>
>>      					
>>
>>      Total
>>
>>      	
>>
>>      801
>>
>>      	
>>
>>      812.98
>>
>>      	
>>
>>      819
>>
>>      	
>>
>>      810.9587
>>
>>      	
>>
>>      3.65% improvement
>>
>>      regards,
>>
>>      Rahul
>>


More information about the hotspot-compiler-dev mailing list