RFR: 8287385: Suppress superficial unstable_if traps

Xin Liu xliu at openjdk.org
Fri Jul 29 19:56:41 UTC 2022


On Thu, 28 Jul 2022 22:45:59 GMT, Vladimir Kozlov <kvn at openjdk.org> wrote:

>> An unstable if trap is **superficial** if it can NOT prune any code. Sometimes, the else-section of program is empty. The superficial unstable_if traps not only complicate code shape but also consume codecache. C2 has to generate debuginfo for them. If the condition changed, HotSpot has to destroy the established nmethod and compile it again.  Our analysis shows that rough 20% unstable_if traps are superficial. 
>> 
>> The algorithm which can identify and suppress superficial unstable if traps derives from its definition.  A non-superficial unstable_if trap must prune some code. Parser skips parsing dead basic blocks(BBs). A trap is superficial if and only if its target BB is not dead! Or, it will be skipped(contradict from definition). As a result, we can suppress an unstable_if trap when c2 parse the target BB. This algorithm leaves alone those uncommon_traps do prune code. 
>> 
>> For example, C2  generates an uncommon_trap for the else if cond is very likely true. 
>> 
>>     public static int foo(boolean cond, int i) {
>>         Value x = new Value(0);
>>         Value y = new Value(1);
>>         Value z = new Value(i);
>> 
>>         if (cond) {
>>             i++;
>>         }
>>         return x._value + y._value + z._value + i;
>>     }
>> 
>> 
>> If we suppress this superficial unstable_if, the nmethod reduces from 608 bytes to 520 bytes, or -14.5%. Most of them come from "scopes data/pcs". It's because superficial unstable_if generates a trap like this
>> 
>> 037     call,static  wrapper for: uncommon_trap(reason='unstable_if' action='reinterpret' debug_id='0')
>>         # SuperficialIfTrap::foo @ bci:29 (line 32) L[0]=_ L[1]=rsp + #4 L[2]=#ScObj0 L[3]=#ScObj1 L[4]=#ScObj2 STK[0]=rsp + #0
>>         # ScObj0 SuperficialIfTrap$Value={ [_value :0]=#0 }
>>         # ScObj1 SuperficialIfTrap$Value={ [_value :0]=#1 }
>>         # ScObj2 SuperficialIfTrap$Value={ [_value :0]=rsp + #4 }
>>         # OopMap {off=60/0x3c}
>> 03c     stop    # ShouldNotReachHere
>> 
>> 
>> Here is the breakdown of nmethod, generated by '-XX:+PrintAssembly'
>> 
>> <-XX:-OptimizeUnstableIf>
>> Compiled method (c2)     346   17       4       SuperficialIfTrap::foo (53 bytes)
>>  total in heap  [0x00007f50f4970910,0x00007f50f4970b70] = 608
>>  relocation     [0x00007f50f4970a70,0x00007f50f4970a80] = 16
>>  main code      [0x00007f50f4970a80,0x00007f50f4970ad8] = 88
>>  stub code      [0x00007f50f4970ad8,0x00007f50f4970af0] = 24
>>  oops           [0x00007f50f4970af0,0x00007f50f4970b00] = 16
>>  metadata       [0x00007f50f4970b00,0x00007f50f4970b08] = 8
>>  scopes data    [0x00007f50f4970b08,0x00007f50f4970b38] = 48
>>  scopes pcs     [0x00007f50f4970b38,0x00007f50f4970b68] = 48
>>  dependencies   [0x00007f50f4970b68,0x00007f50f4970b70] = 8
>> 
>> <-XX:+OptimizeUnstableIf>
>> Compiled method (c2)     309   17       4       SuperficialIfTrap::foo (53 bytes)
>>  total in heap  [0x00007f4090970910,0x00007f4090970b18] = 520
>>  relocation     [0x00007f4090970a70,0x00007f4090970a80] = 16
>>  main code      [0x00007f4090970a80,0x00007f4090970ac8] = 72
>>  stub code      [0x00007f4090970ac8,0x00007f4090970ae0] = 24
>>  oops           [0x00007f4090970ae0,0x00007f4090970ae8] = 8
>>  scopes data    [0x00007f4090970ae8,0x00007f4090970af0] = 8
>>  scopes pcs     [0x00007f4090970af0,0x00007f4090970b10] = 32
>>  dependencies   [0x00007f4090970b10,0x00007f4090970b18] = 8
>
> I thought about this change more. You are trading performance lost with saving of some space in CodeCache.  I don't think we should do this.
> C2's one of main optimization is class propagation based on profiling (checkcast). It allows significantly reduce following code if profiling shows only one class was observed. I am not sure if removal of "superficial" uncommon trap will not obstruct this and other similar optimizations.

hi, @vnkozlov  
> I thought about this change more. You are trading performance lost with saving of some space in CodeCache. I don't think we should do this.

Thanks you for taking look this. you're right.  


I can provide some datapoints from my experiments. I do see that it can reduce some compilations.  I ran Renaissance with `-Xlog:deoptimization=debug`  and piped logs to `grep "level=4.*unstable_if" | cut -c 29- | sort -h | uniq | wc -l`.  It counts the deoptimization events due to unstable_if. eg.
`[debug][deoptimization] cid=1849 level=4 java.util.concurrent.ForkJoinPool.scan(Ljava/util/concurrent/ForkJoinPool$WorkQueue;II)I trap_bci=137 unstable_if reinterpret pc=0x00007fd6f0a5b790 relative_pc=0x0000000000000430`

 I found that this reduces 11%(median) deoptimzation of unstable_if. Unfortunately, those events are rare and they won't make much difference.  Given the fact that the JIT compilers are both multi-threaded and concurrent,  the overhead of JIT is super low.  Secondly, I am surprised that hotspot is very responsive. it quickly recompiles deopt'ed methods with new information and replaces the old nmethod with a new revision which avoids uncommon_trap.  I hardly observe codecache savings.  To put them together, I have to concede this feature doesn't make sense.

| benchmark       | Before | After | diff     |
| ---------------- | ------ | ----- | -------- |
| scrabble         | 18     | 16    | -11.11% |
| page-rank        | 193    | 181   | -6.22%  |
| future-genetic   | 38     | 38    | 0.00%    |
| akka-uct         | 118    | 99    | -16.10% |
| movie-lens       | 198    | 185   | -6.57%  |
| scala-doku       | 44     | 44    | 0.00%    |
| chi-square       | 125    | 104   | -16.80% |
| fj-kmeans        | 26     | 19    | -26.92% |
| rx-scrabble      | 35     | 34    | -2.86%  |
| finagle-http     | 173    | 126   | -27.17% |
| reactors         | 81     | 72    | -11.11% |
| dec-tree         | 200    | 170   | -15.00% |
| scala-stm-bench7 | 70     | 66    | -5.71%  |
| naive-bayes      | 171    | 144   | -15.79% |
| als              | 214    | 186   | -13.08% |
| par-mnemonics    | 19     | 19    | 0.00%    |
| scala-kmeans     | 15     | 13    | -13.33% |
| philosophers     | 35     | 31    | -11.43% |
| log-regression   | 184    | 148   | -19.57% |
| gauss-mix        | 145    | 119   | -17.93% |
| mnemonics        | 13     | 13    | 0.00%    |
| dotty            | 393    | 338   | -13.99% |
| finagle-chirper  | 268    | 264   | -1.49%  |


Speaking of "class propagation", do you mean `UseTypeSpeculation`? I will take a closer look at this feature.

thanks,
--lx

-------------

PR: https://git.openjdk.org/jdk/pull/9601


More information about the hotspot-compiler-dev mailing list