RFR(S): 8209544: AES encrypt performance regression in jdk11b11
Roland Westrelin
rwestrel at redhat.com
Tue Aug 28 12:56:22 UTC 2018
http://cr.openjdk.java.net/~roland/8209544/webrev.00/
The performance regression is caused by 8200303 (C2 should leverage
profiling for lookupswitch/tableswitch). Running with
-UseSwitchProfiling makes the regression go away.
-prof perfasm reports that 70% of the time is spent in stubs
(StubRoutines::aescrypt_encryptBlock) where UseSwitchProfiling makes no
difference. 20+% is spent in com.sun.crypto.provider.CipherCore::doFinal
which has a single lookupswitch in the hot code path. There's a
difference in the code sequence generated:
With -XX:-UseSwitchProfiling:
0x00007f9c3056ec50: cmp $0x7,%ecx
0x00007f9c3056ec53: je 0x00007f9c3056ee42 ;*lookupswitch {reexecute=0 rethrow=0 return_oop=0}
With -XX:+UseSwitchProfiling:
0x00007f59a456ec52: add $0xfffffff9,%eax
0x00007f59a456ec55: cmp $0x1,%eax
0x00007f59a456ec58: jb 0x00007f59a456eeba ;*lookupswitch {reexecute=0 rethrow=0 return_oop=0}
The switch is:
switch (cipherMode) {
case GCM_MODE:
// some code
break;
default:
// some code
break;
}
Only the default case is ever taken. That translates into 3 ranges:
{..6}=>91 (cnt=4790.000000)
{7}=>2147483647 (cnt=0.000000)
{8..}=>91 (cnt=4790.000000)
With -XX:-UseSwitchProfiling, the compiled code performs a binary search
among ranges. It picks range {7} as a mid point and has special logic to
emit a single comparison:
// Special Case: If there are exactly three ranges, and the high
// and low range each go to the same place, omit the "gt" test,
// since it will not discriminate anything.
bool eq_test_only = (hi == lo+2 && hi->dest() == lo->dest() && mid == hi-1) || mid == lo;
With -XX:+UseSwitchProfiling, the compiled code performs a binary search
also but it picks mid points based on profiling so the binary search
tree is not balanced.
It picks {..6} as first mid point and then {8..} so:
if (v > 6 || v < 8) {
uncommon_trap();
} else {
// do stuff
}
which is optimized as:
if (v - 7 <u 1) {
uncommon_trap();
} else {
// do stuff
}
And that results in the generated code above. It's actually possible to
transform this to:
if (v - 7 == 0) {
and then to:
if (v == 7) {
and then fall back to the same code as the
-XX:-UseSwitchProfiling. That's what I propose as a fix.
Roland.
More information about the hotspot-compiler-dev
mailing list