KNL specific fix: disable generating INC and DEC instructions on Xeon Phi and Silvermont CPUs

Kandu, Rahul rahul.kandu at intel.com
Thu Jun 22 18:34:54 UTC 2017


Hi Vladimir, 

Below option is valid for CPU ID 0x57 and 0x85 due to AVX support on Xeon Phi. Silvermont CPUs 0x36, 0x37 etc. had support up to SSE4.x and not AVX. 
_features &= ~CPU_VZEROUPPER;

It may be better to separate CPU model checks for Knights family (Xeon Phi and its successors) from previous Silvermont (ATOM) family processors due to several differences in the instruction set. 

Rahul

-----Original Message-----
From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] 
Sent: Wednesday, June 21, 2017 5:02 PM
To: Kandu, Rahul <rahul.kandu at intel.com>; hotspot-compiler-dev at openjdk.java.net
Subject: Re: KNL specific fix: disable generating INC and DEC instructions on Xeon Phi and Silvermont CPUs

Thank you, Rahul

Why you need to split code and not just add new cpu models checks at the second place?:

@@ -1179,7 +1179,9 @@
      if ((cpu_family() == 0x06) &&
          ((extended_cpu_model() == 0x36) || // Centerton
           (extended_cpu_model() == 0x37) || // Silvermont
-         (extended_cpu_model() == 0x4D))) {
+         (extended_cpu_model() == 0x4D) ||
+         (extended_cpu_model() == 0x57) ||   // Xeon Phi 3200/5200/7200
+         (extended_cpu_model() == 0x85))) {  // Future Xeon Phi
  #ifdef COMPILER2
        if (FLAG_IS_DEFAULT(OptoScheduling)) {
          OptoScheduling = true;
@@ -1190,6 +1192,9 @@
            UseUnalignedLoadStores = true; // use movdqu on newest Intel cpus
          }
        }
+      if (FLAG_IS_DEFAULT(UseIncDec)) {
+        FLAG_SET_DEFAULT(UseIncDec, false);
+      }
      }
      if(FLAG_IS_DEFAULT(AllocatePrefetchInstr) && supports_3dnow_prefetch()) {

Thanks,
Vladimir

On 6/21/17 3:39 PM, Kandu, Rahul wrote:
> Hi Vladimir,
> 
> Webrev for the code change.. after correcting auto indent parameters as specified.
> 
> http://cr.openjdk.java.net/~vdeshpande/8182138/webrev.02/
> Openjdk bug: https://bugs.openjdk.java.net/browse/JDK-8182138
> 
> 
> --- old/src/cpu/x86/vm/vm_version_x86.cpp	2017-06-21 14:57:28.002941500 -0700
> +++ new/src/cpu/x86/vm/vm_version_x86.cpp	2017-06-21 14:57:27.660400400 -0700
> @@ -654,6 +654,19 @@
>           ((extended_cpu_model() == 0x57) ||   // Xeon Phi 3200/5200/7200
>           (extended_cpu_model() == 0x85))) {  // Future Xeon Phi
>         _features &= ~CPU_VZEROUPPER;
> +      if (FLAG_IS_DEFAULT(UseIncDec)) {
> +        FLAG_SET_DEFAULT(UseIncDec, false);
> +      }
> +#ifdef COMPILER2
> +      if (FLAG_IS_DEFAULT(OptoScheduling)) {
> +        OptoScheduling = true;
> +      }
> +#endif
> +      if (supports_sse4_2()) { // Silvermont
> +        if (FLAG_IS_DEFAULT(UseUnalignedLoadStores)) {
> +          UseUnalignedLoadStores = true; // use movdqu on newest Intel cpus
> +        }
> +      }
>       }
>     }
>   
> @@ -1193,6 +1206,9 @@
>             UseUnalignedLoadStores = true; // use movdqu on newest Intel cpus
>           }
>         }
> +      if (FLAG_IS_DEFAULT(UseIncDec)) {
> +        FLAG_SET_DEFAULT(UseIncDec, false);
> +      }
>       }
> 
> regards,
> Rahul
> 
> 
> 
> 
> -----Original Message-----
> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com]
> Sent: Friday, June 16, 2017 2:29 PM
> To: Kandu, Rahul <rahul.kandu at intel.com>; 
> hotspot-compiler-dev at openjdk.java.net
> Subject: Re: KNL specific fix: disable generating INC and DEC 
> instructions on Xeon Phi and Silvermont CPUs
> 
> I don't see it is fixed:
> 
> +       FLAG_SET_DEFAULT(UseIncDec, false);
> +       }
> +#ifdef COMPILER2
> + if (FLAG_IS_DEFAULT(OptoScheduling)) {
> +  OptoScheduling = true;
> + }
> +#endif
> +      if (supports_sse4_2()) { // Silvermont
> 
> +       if (FLAG_IS_DEFAULT(UseIncDec)){
> +        FLAG_SET_DEFAULT(UseIncDec, false);
> +        }
> 
> Vladimir
> 
> On 6/16/17 2:03 PM, Kandu, Rahul wrote:
>> Hi Vladimir,
>>
>> Thanks. Fixed the indents- no tabs in the code change. Please find 
>> the updated webrev below.
>>
>> Openjdk bug location: 
>> https://bugs.openjdk.java.net/browse/JDK-8182138
>>
>> Webrev for the code change:
>> http://cr.openjdk.java.net/~vdeshpande/8182138/webrev.01/
>>
>> regards,
>>
>> Rahul
>>
>> *From:*Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com]
>> *Sent:* Thursday, June 15, 2017 2:10 PM
>> *To:* Kandu, Rahul <rahul.kandu at intel.com>; 
>> hotspot-compiler-dev at openjdk.java.net
>> *Subject:* Re: KNL specific fix: disable generating INC and DEC 
>> instructions on Xeon Phi and Silvermont CPUs
>>
>> Hi Rahul
>>
>> Please fix indents - don't use tabs.
>>
>> Vladimir
>>
>> On 6/15/17 1:14 PM, Kandu, Rahul wrote:
>>
>>      Hi all,
>>
>>      The following patch disables generating INC, DEC instructions on
>>      Xeon Phi and Silvermont ATOM based CPUs. We have currently
>>      identified that using INC and DEC can suffer from unexpected
>>      performance drops on certain processors which don't optimize for
>>      partial write flags. This patch disables generation of these two
>>      instructions as they are more commonly used at loop
>>      increment/decrement.
>>
>>      Patch provides 3.65% better performance on Knights Landing CPU on
>>      SPECjvm2008 composite score as per runs below on the latest openjdk
>>      source.
>>
>>      Openjdk bug location:
>> https://bugs.openjdk.java.net/browse/JDK-8182138
>>
>>      Webrev for the code change:
>>      http://cr.openjdk.java.net/~vdeshpande/8182138/webrev.00/
>>      <http://cr.openjdk.java.net/%7Evdeshpande/8182138/webrev.00/>
>>
>>      Scores:
>>
>>      	
>>
>>      *6/10 jdk10 code (no change)*
>>
>>      						
>>
>>      *6/10 jdk10code with this patch *
>>
>>      				
>>      	
>>
>>      *run1*
>>
>>      	
>>
>>      *run2*
>>
>>      	
>>
>>      *run3*
>>
>>      	
>>
>>      *geomean*
>>
>>      						
>>
>>      *run1*
>>
>>      	
>>
>>      *run2*
>>
>>      	
>>
>>      *run3*
>>
>>      	
>>
>>      *geomean*
>>
>>      		
>>
>>      lu.small
>>
>>      	
>>
>>      1503.79
>>
>>      	
>>
>>      1500.62
>>
>>      	
>>
>>      1494.98
>>
>>      	
>>
>>      1499.792
>>
>>      					
>>
>>      lu.small
>>
>>      	
>>
>>      1478.48
>>
>>      	
>>
>>      1493.78
>>
>>      	
>>
>>      1509.2
>>
>>      	
>>
>>      1493.767
>>
>>      		
>>
>>      sor.small
>>
>>      	
>>
>>      2417.24
>>
>>      	
>>
>>      2372.1
>>
>>      	
>>
>>      2356.47
>>
>>      	
>>
>>      2381.798
>>
>>      					
>>
>>      sor.small
>>
>>      	
>>
>>      2436.89
>>
>>      	
>>
>>      2434.46
>>
>>      	
>>
>>      2446.88
>>
>>      	
>>
>>      2439.404
>>
>>      		
>>
>>      sparse.small
>>
>>      	
>>
>>      606.35
>>
>>      	
>>
>>      635.19
>>
>>      	
>>
>>      595.44
>>
>>      	
>>
>>      612.099
>>
>>      					
>>
>>      sparse.small
>>
>>      	
>>
>>      681.96
>>
>>      	
>>
>>      728.02
>>
>>      	
>>
>>      671.41
>>
>>      	
>>
>>      693.3673
>>
>>      		
>>
>>      fft.small
>>
>>      	
>>
>>      1463.55
>>
>>      	
>>
>>      1406.43
>>
>>      	
>>
>>      1173.63
>>
>>      	
>>
>>      1341.793
>>
>>      					
>>
>>      fft.small
>>
>>      	
>>
>>      1220.14
>>
>>      	
>>
>>      1425.19
>>
>>      	
>>
>>      1190.06
>>
>>      	
>>
>>      1274.335
>>
>>      		
>>
>>      monte_carlo
>>
>>      	
>>
>>      823.66
>>
>>      	
>>
>>      825.96
>>
>>      	
>>
>>      761.26
>>
>>      	
>>
>>      803.0575
>>
>>      					
>>
>>      monte_carlo
>>
>>      	
>>
>>      939.53
>>
>>      	
>>
>>      923
>>
>>      	
>>
>>      934.76
>>
>>      	
>>
>>      932.4041
>>
>>      		
>>
>>      sparse.large
>>
>>      	
>>
>>      159.45
>>
>>      	
>>
>>      139.83
>>
>>      	
>>
>>      155.76
>>
>>      	
>>
>>      151.4352
>>
>>      					
>>
>>      sparse.large
>>
>>      	
>>
>>      100.66
>>
>>      	
>>
>>      150.22
>>
>>      	
>>
>>      179.79
>>
>>      	
>>
>>      139.5672
>>
>>      		
>>
>>      fft.large
>>
>>      	
>>
>>      419.19
>>
>>      	
>>
>>      425.81
>>
>>      	
>>
>>      432.6
>>
>>      	
>>
>>      425.8315
>>
>>      					
>>
>>      fft.large
>>
>>      	
>>
>>      433.11
>>
>>      	
>>
>>      424.72
>>
>>      	
>>
>>      429.07
>>
>>      	
>>
>>      428.953
>>
>>      		
>>
>>      sor.large
>>
>>      	
>>
>>      416.31
>>
>>      	
>>
>>      262.98
>>
>>      	
>>
>>      271.31
>>
>>      	
>>
>>      309.6957
>>
>>      					
>>
>>      sor.large
>>
>>      	
>>
>>      366.6
>>
>>      	
>>
>>      397.67
>>
>>      	
>>
>>      352.75
>>
>>      	
>>
>>      371.8725
>>
>>      		
>>
>>      lu.large
>>
>>      	
>>
>>      116.46
>>
>>      	
>>
>>      127.51
>>
>>      	
>>
>>      129.33
>>
>>      	
>>
>>      124.3007
>>
>>      					
>>
>>      lu.large
>>
>>      	
>>
>>      124.2
>>
>>      	
>>
>>      122.69
>>
>>      	
>>
>>      124.1
>>
>>      	
>>
>>      123.6614
>>
>>      		
>>
>>      transform
>>
>>      	
>>
>>      1056.64
>>
>>      	
>>
>>      1066.6
>>
>>      	
>>
>>      1021.08
>>
>>      	
>>
>>      1047.923
>>
>>      					
>>
>>      transform
>>
>>      	
>>
>>      1015.85
>>
>>      	
>>
>>      1056.42
>>
>>      	
>>
>>      1049.42
>>
>>      	
>>
>>      1040.412
>>
>>      		
>>
>>      validation
>>
>>      	
>>
>>      1371.86
>>
>>      	
>>
>>      1898.49
>>
>>      	
>>
>>      1971.28
>>
>>      	
>>
>>      1725.131
>>
>>      					
>>
>>      validation
>>
>>      	
>>
>>      2088.81
>>
>>      	
>>
>>      2178.14
>>
>>      	
>>
>>      2112.95
>>
>>      	
>>
>>      2126.301
>>
>>      		
>>
>>      aes
>>
>>      	
>>
>>      276.67
>>
>>      	
>>
>>      255.84
>>
>>      	
>>
>>      299.78
>>
>>      	
>>
>>      276.8499
>>
>>      					
>>
>>      aes
>>
>>      	
>>
>>      261.5
>>
>>      	
>>
>>      258.95
>>
>>      	
>>
>>      290.17
>>
>>      	
>>
>>      269.8444
>>
>>      		
>>
>>      rsa
>>
>>      	
>>
>>      1041.29
>>
>>      	
>>
>>      1069.51
>>
>>      	
>>
>>      1069.26
>>
>>      	
>>
>>      1059.937
>>
>>      					
>>
>>      rsa
>>
>>      	
>>
>>      1091.45
>>
>>      	
>>
>>      1089.15
>>
>>      	
>>
>>      1095.52
>>
>>      	
>>
>>      1092.037
>>
>>      		
>>
>>      signverify
>>
>>      	
>>
>>      2583.7
>>
>>      	
>>
>>      2592.98
>>
>>      	
>>
>>      2586.34
>>
>>      	
>>
>>      2587.67
>>
>>      					
>>
>>      signverify
>>
>>      	
>>
>>      2660.73
>>
>>      	
>>
>>      2664.17
>>
>>      	
>>
>>      2634.47
>>
>>      	
>>
>>      2653.09
>>
>>      		
>>
>>      compress
>>
>>      	
>>
>>      817.65
>>
>>      	
>>
>>      817.44
>>
>>      	
>>
>>      816.55
>>
>>      	
>>
>>      817.2132
>>
>>      					
>>
>>      compress
>>
>>      	
>>
>>      852.55
>>
>>      	
>>
>>      847.61
>>
>>      	
>>
>>      894.59
>>
>>      	
>>
>>      864.6626
>>
>>      		
>>
>>      serial
>>
>>      	
>>
>>      608.48
>>
>>      	
>>
>>      586.62
>>
>>      	
>>
>>      615.37
>>
>>      	
>>
>>      603.3646
>>
>>      					
>>
>>      serial
>>
>>      	
>>
>>      627.19
>>
>>      	
>>
>>      605.21
>>
>>      	
>>
>>      619.31
>>
>>      	
>>
>>      617.1695
>>
>>      		
>>
>>      sunflow
>>
>>      	
>>
>>      371.28
>>
>>      	
>>
>>      373.03
>>
>>      	
>>
>>      373.04
>>
>>      	
>>
>>      372.4491
>>
>>      					
>>
>>      sunflow
>>
>>      	
>>
>>      368.59
>>
>>      	
>>
>>      381.78
>>
>>      	
>>
>>      369.64
>>
>>      	
>>
>>      373.289
>>
>>      		
>>
>>      mpegaudio
>>
>>      	
>>
>>      743.85
>>
>>      	
>>
>>      734.46
>>
>>      	
>>
>>      752.62
>>
>>      	
>>
>>      743.6064
>>
>>      					
>>
>>      mpegaudio
>>
>>      	
>>
>>      775.45
>>
>>      	
>>
>>      773.35
>>
>>      	
>>
>>      776.98
>>
>>      	
>>
>>      775.2586
>>
>>      		
>>
>>      derby
>>
>>      	
>>
>>      1929.9
>>
>>      	
>>
>>      1901.28
>>
>>      	
>>
>>      1922.56
>>
>>      	
>>
>>      1917.875
>>
>>      					
>>
>>      derby
>>
>>      	
>>
>>      1927.97
>>
>>      	
>>
>>      1865.47
>>
>>      	
>>
>>      1919.17
>>
>>      	
>>
>>      1904.002
>>
>>      		
>>
>>      Total
>>
>>      	
>>
>>      780.54
>>
>>      	
>>
>>      779.91
>>
>>      	
>>
>>      786.98
>>
>>      	
>>
>>      782.4702
>>
>>      					
>>
>>      Total
>>
>>      	
>>
>>      801
>>
>>      	
>>
>>      812.98
>>
>>      	
>>
>>      819
>>
>>      	
>>
>>      810.9587
>>
>>      	
>>
>>      3.65% improvement
>>
>>      regards,
>>
>>      Rahul
>>


More information about the hotspot-compiler-dev mailing list