RFR(S): 8209544: AES encrypt performance regression in jdk11b11

Vladimir Kozlov vladimir.kozlov at oracle.com
Tue Aug 28 22:20:41 UTC 2018


Hi Roland,

cmp1->Opcode() is virtual call and it is cached in local variable - use it in all places in this method:
int cmp1_op = cmp1->Opcode();

Move 'cmp2_type == TypeInt::ZERO' check before is_counted_loop_cmp(cmp) call which is more expensive.

is_Con() is also true for TOP. Use other check.

Thanks,
Vladimir

On 8/28/18 5:56 AM, Roland Westrelin wrote:
> 
> http://cr.openjdk.java.net/~roland/8209544/webrev.00/
> 
> The performance regression is caused by 8200303 (C2 should leverage
> profiling for lookupswitch/tableswitch). Running with
> -UseSwitchProfiling makes the regression go away.
> 
> -prof perfasm reports that 70% of the time is spent in stubs
> (StubRoutines::aescrypt_encryptBlock) where UseSwitchProfiling makes no
> difference. 20+% is spent in com.sun.crypto.provider.CipherCore::doFinal
> which has a single lookupswitch in the hot code path. There's a
> difference in the code sequence generated:
> 
> With -XX:-UseSwitchProfiling:
> 
> 0x00007f9c3056ec50: cmp $0x7,%ecx
> 0x00007f9c3056ec53: je 0x00007f9c3056ee42 ;*lookupswitch {reexecute=0 rethrow=0 return_oop=0}
> 
> With -XX:+UseSwitchProfiling:
> 
> 0x00007f59a456ec52: add $0xfffffff9,%eax
> 0x00007f59a456ec55: cmp $0x1,%eax
> 0x00007f59a456ec58: jb 0x00007f59a456eeba ;*lookupswitch {reexecute=0 rethrow=0 return_oop=0}
> 
> The switch is:
> 
> switch (cipherMode) {
> case GCM_MODE:
>    // some code
>    break;
> default:
>    // some code
>    break;
> }
> 
> Only the default case is ever taken. That translates into 3 ranges:
> 
>   {..6}=>91 (cnt=4790.000000)
>   {7}=>2147483647 (cnt=0.000000)
>   {8..}=>91 (cnt=4790.000000)
> 
> With -XX:-UseSwitchProfiling, the compiled code performs a binary search
> among ranges. It picks range {7} as a mid point and has special logic to
> emit a single comparison:
> 
>        // Special Case:  If there are exactly three ranges, and the high
>        // and low range each go to the same place, omit the "gt" test,
>        // since it will not discriminate anything.
>        bool eq_test_only = (hi == lo+2 && hi->dest() == lo->dest() && mid == hi-1) || mid == lo;
> 
> With -XX:+UseSwitchProfiling, the compiled code performs a binary search
> also but it picks mid points based on profiling so the binary search
> tree is not balanced.
> 
> It picks {..6} as first mid point and then {8..} so:
> 
> if (v > 6 || v < 8) {
>    uncommon_trap();
> } else {
>     // do stuff
> }
> 
> which is optimized as:
> 
> if (v - 7 <u 1) {
>     uncommon_trap();
> } else {
>     // do stuff
> }
> 
> And that results in the generated code above. It's actually possible to
> transform this to:
> 
> if (v - 7 == 0) {
> 
> and then to:
> 
> if (v == 7) {
> 
> and then fall back to the same code as the
> -XX:-UseSwitchProfiling. That's what I propose as a fix.
> 
> Roland.
> 


More information about the hotspot-compiler-dev mailing list