RFR(S): 8209544: AES encrypt performance regression in jdk11b11
Vladimir Kozlov
vladimir.kozlov at oracle.com
Tue Aug 28 22:20:41 UTC 2018
Hi Roland,
cmp1->Opcode() is virtual call and it is cached in local variable - use it in all places in this method:
int cmp1_op = cmp1->Opcode();
Move 'cmp2_type == TypeInt::ZERO' check before is_counted_loop_cmp(cmp) call which is more expensive.
is_Con() is also true for TOP. Use other check.
Thanks,
Vladimir
On 8/28/18 5:56 AM, Roland Westrelin wrote:
>
> http://cr.openjdk.java.net/~roland/8209544/webrev.00/
>
> The performance regression is caused by 8200303 (C2 should leverage
> profiling for lookupswitch/tableswitch). Running with
> -UseSwitchProfiling makes the regression go away.
>
> -prof perfasm reports that 70% of the time is spent in stubs
> (StubRoutines::aescrypt_encryptBlock) where UseSwitchProfiling makes no
> difference. 20+% is spent in com.sun.crypto.provider.CipherCore::doFinal
> which has a single lookupswitch in the hot code path. There's a
> difference in the code sequence generated:
>
> With -XX:-UseSwitchProfiling:
>
> 0x00007f9c3056ec50: cmp $0x7,%ecx
> 0x00007f9c3056ec53: je 0x00007f9c3056ee42 ;*lookupswitch {reexecute=0 rethrow=0 return_oop=0}
>
> With -XX:+UseSwitchProfiling:
>
> 0x00007f59a456ec52: add $0xfffffff9,%eax
> 0x00007f59a456ec55: cmp $0x1,%eax
> 0x00007f59a456ec58: jb 0x00007f59a456eeba ;*lookupswitch {reexecute=0 rethrow=0 return_oop=0}
>
> The switch is:
>
> switch (cipherMode) {
> case GCM_MODE:
> // some code
> break;
> default:
> // some code
> break;
> }
>
> Only the default case is ever taken. That translates into 3 ranges:
>
> {..6}=>91 (cnt=4790.000000)
> {7}=>2147483647 (cnt=0.000000)
> {8..}=>91 (cnt=4790.000000)
>
> With -XX:-UseSwitchProfiling, the compiled code performs a binary search
> among ranges. It picks range {7} as a mid point and has special logic to
> emit a single comparison:
>
> // Special Case: If there are exactly three ranges, and the high
> // and low range each go to the same place, omit the "gt" test,
> // since it will not discriminate anything.
> bool eq_test_only = (hi == lo+2 && hi->dest() == lo->dest() && mid == hi-1) || mid == lo;
>
> With -XX:+UseSwitchProfiling, the compiled code performs a binary search
> also but it picks mid points based on profiling so the binary search
> tree is not balanced.
>
> It picks {..6} as first mid point and then {8..} so:
>
> if (v > 6 || v < 8) {
> uncommon_trap();
> } else {
> // do stuff
> }
>
> which is optimized as:
>
> if (v - 7 <u 1) {
> uncommon_trap();
> } else {
> // do stuff
> }
>
> And that results in the generated code above. It's actually possible to
> transform this to:
>
> if (v - 7 == 0) {
>
> and then to:
>
> if (v == 7) {
>
> and then fall back to the same code as the
> -XX:-UseSwitchProfiling. That's what I propose as a fix.
>
> Roland.
>
More information about the hotspot-compiler-dev
mailing list