RFR(S): 8209544: AES encrypt performance regression in jdk11b11

Roland Westrelin rwestrel at redhat.com
Tue Aug 28 12:56:22 UTC 2018


http://cr.openjdk.java.net/~roland/8209544/webrev.00/

The performance regression is caused by 8200303 (C2 should leverage
profiling for lookupswitch/tableswitch). Running with
-UseSwitchProfiling makes the regression go away.

-prof perfasm reports that 70% of the time is spent in stubs
(StubRoutines::aescrypt_encryptBlock) where UseSwitchProfiling makes no
difference. 20+% is spent in com.sun.crypto.provider.CipherCore::doFinal
which has a single lookupswitch in the hot code path. There's a
difference in the code sequence generated:

With -XX:-UseSwitchProfiling: 

0x00007f9c3056ec50: cmp $0x7,%ecx 
0x00007f9c3056ec53: je 0x00007f9c3056ee42 ;*lookupswitch {reexecute=0 rethrow=0 return_oop=0} 

With -XX:+UseSwitchProfiling: 

0x00007f59a456ec52: add $0xfffffff9,%eax 
0x00007f59a456ec55: cmp $0x1,%eax 
0x00007f59a456ec58: jb 0x00007f59a456eeba ;*lookupswitch {reexecute=0 rethrow=0 return_oop=0} 

The switch is:

switch (cipherMode) {
case GCM_MODE:
  // some code
  break;
default:
  // some code
  break;
}

Only the default case is ever taken. That translates into 3 ranges:

 {..6}=>91 (cnt=4790.000000)
 {7}=>2147483647 (cnt=0.000000)
 {8..}=>91 (cnt=4790.000000)

With -XX:-UseSwitchProfiling, the compiled code performs a binary search
among ranges. It picks range {7} as a mid point and has special logic to
emit a single comparison:

      // Special Case:  If there are exactly three ranges, and the high
      // and low range each go to the same place, omit the "gt" test,
      // since it will not discriminate anything.
      bool eq_test_only = (hi == lo+2 && hi->dest() == lo->dest() && mid == hi-1) || mid == lo;

With -XX:+UseSwitchProfiling, the compiled code performs a binary search
also but it picks mid points based on profiling so the binary search
tree is not balanced.

It picks {..6} as first mid point and then {8..} so:

if (v > 6 || v < 8) {
  uncommon_trap();
} else {
   // do stuff
}

which is optimized as:

if (v - 7 <u 1) {
   uncommon_trap();
} else {
   // do stuff
}

And that results in the generated code above. It's actually possible to
transform this to:

if (v - 7 == 0) {

and then to:

if (v == 7) {

and then fall back to the same code as the
-XX:-UseSwitchProfiling. That's what I propose as a fix.

Roland.


More information about the hotspot-compiler-dev mailing list