RFR: 8287385: Suppress superficial unstable_if traps
Xin Liu
xliu at openjdk.org
Thu Jul 28 01:35:31 UTC 2022
On Tue, 26 Jul 2022 02:59:41 GMT, Vladimir Kozlov <kvn at openjdk.org> wrote:
>> An unstable if trap is **superficial** if it can NOT prune any code. Sometimes, the else-section of program is empty. The superficial unstable_if traps not only complicate code shape but also consume codecache. C2 has to generate debuginfo for them. If the condition changed, HotSpot has to destroy the established nmethod and compile it again. Our analysis shows that rough 20% unstable_if traps are superficial.
>>
>> The algorithm which can identify and suppress superficial unstable if traps derives from its definition. A non-superficial unstable_if trap must prune some code. Parser skips parsing dead basic blocks(BBs). A trap is superficial if and only if its target BB is not dead! Or, it will be skipped(contradict from definition). As a result, we can suppress an unstable_if trap when c2 parse the target BB. This algorithm leaves alone those uncommon_traps do prune code.
>>
>> For example, C2 generates an uncommon_trap for the else if cond is very likely true.
>>
>> public static int foo(boolean cond, int i) {
>> Value x = new Value(0);
>> Value y = new Value(1);
>> Value z = new Value(i);
>>
>> if (cond) {
>> i++;
>> }
>> return x._value + y._value + z._value + i;
>> }
>>
>>
>> If we suppress this superficial unstable_if, the nmethod reduces from 608 bytes to 520 bytes, or -14.5%. Most of them come from "scopes data/pcs". It's because superficial unstable_if generates a trap like this
>>
>> 037 call,static wrapper for: uncommon_trap(reason='unstable_if' action='reinterpret' debug_id='0')
>> # SuperficialIfTrap::foo @ bci:29 (line 32) L[0]=_ L[1]=rsp + #4 L[2]=#ScObj0 L[3]=#ScObj1 L[4]=#ScObj2 STK[0]=rsp + #0
>> # ScObj0 SuperficialIfTrap$Value={ [_value :0]=#0 }
>> # ScObj1 SuperficialIfTrap$Value={ [_value :0]=#1 }
>> # ScObj2 SuperficialIfTrap$Value={ [_value :0]=rsp + #4 }
>> # OopMap {off=60/0x3c}
>> 03c stop # ShouldNotReachHere
>>
>>
>> Here is the breakdown of nmethod, generated by '-XX:+PrintAssembly'
>>
>> <-XX:-OptimizeUnstableIf>
>> Compiled method (c2) 346 17 4 SuperficialIfTrap::foo (53 bytes)
>> total in heap [0x00007f50f4970910,0x00007f50f4970b70] = 608
>> relocation [0x00007f50f4970a70,0x00007f50f4970a80] = 16
>> main code [0x00007f50f4970a80,0x00007f50f4970ad8] = 88
>> stub code [0x00007f50f4970ad8,0x00007f50f4970af0] = 24
>> oops [0x00007f50f4970af0,0x00007f50f4970b00] = 16
>> metadata [0x00007f50f4970b00,0x00007f50f4970b08] = 8
>> scopes data [0x00007f50f4970b08,0x00007f50f4970b38] = 48
>> scopes pcs [0x00007f50f4970b38,0x00007f50f4970b68] = 48
>> dependencies [0x00007f50f4970b68,0x00007f50f4970b70] = 8
>>
>> <-XX:+OptimizeUnstableIf>
>> Compiled method (c2) 309 17 4 SuperficialIfTrap::foo (53 bytes)
>> total in heap [0x00007f4090970910,0x00007f4090970b18] = 520
>> relocation [0x00007f4090970a70,0x00007f4090970a80] = 16
>> main code [0x00007f4090970a80,0x00007f4090970ac8] = 72
>> stub code [0x00007f4090970ac8,0x00007f4090970ae0] = 24
>> oops [0x00007f4090970ae0,0x00007f4090970ae8] = 8
>> scopes data [0x00007f4090970ae8,0x00007f4090970af0] = 8
>> scopes pcs [0x00007f4090970af0,0x00007f4090970b10] = 32
>> dependencies [0x00007f4090970b10,0x00007f4090970b18] = 8
>
> Did you address @merykitty comment in RFE? You said:
> `it looks like this JBS does have this downsize, I will investigate this problem`
hi, @vnkozlov and @merykitty
> Did you address @merykitty comment in RFE? You said: `it looks like this JBS does have this downsize, I will investigate this problem`
I think about this. First of all, I admit that this does impact "peak" performance. In a nutshell, that is where tracing JIT is superior than method-based compilation. But c2 is a method-based with heroic optimizations. This change makes Java execution more predictable and less unstable_if traps.
Secondly, c2 is adaptive. too_many_traps() and PerBytecodeTrapLimit both lead the final revision of nmethod to include both paths like this patch does, unless the real execution never take another path. Code like that is doubtful. I ran both SpecJVM2008 and Renaissance. I haven't seen difference in peak performance.
When it comes to constant prorogation, I think it will have 2 positive outcomes. A constant can simplify control flow and reduce the strength of arithmetic computation. In the first case, actually, it won't change too much because c2 profiles branches and we still prune non-superficial paths on the basis of possibilities even though values are not constant. For the second one, I think it should make different. To mitigate it, I came up an improvement. I can't guarantee to detect all cases where merging hinders constant folding. It's because Constant Propagation happens in optimizer but this patch is working in parsing time. I think I can detect the simple case like this in 'UnstableIfTrap::suppress'.
int i = x;
if (cond) {
i = 0;
}
When we attempt to create a phi node for i and we realize that the previous value is a constant, or ConI#0 in this case we give up suppressing. How about it?
-------------
PR: https://git.openjdk.org/jdk/pull/9601
More information about the hotspot-compiler-dev
mailing list