[9] RFR (M): 8063137: Never-taken branches should be pruned when GWT LambdaForms are shared

Vladimir Kozlov vladimir.kozlov at oracle.com
Mon Jan 19 18:23:29 UTC 2015


On 1/19/15 9:05 AM, Vladimir Ivanov wrote:
> Thanks, Vladimir!
>
>> I would suggest to add more detailed comment (instead of simple "Stop
>> profiling") to inline_profileBranch() intrinsic explaining what it is
>> doing because it is not strictly "intrinsic" - it does not implement
>> profileBranch() java code when counts is constant.
> Sure, will do.
>
>> You forgot to mark Opaque4Node as macro node. I would suggest to base it
>> on Opaque2Node then you will get some methods from it.
> Do I really need to do so? I expect it to go away during IGVN pass right after parsing is over. That's why I register
> the node for igvn in LibraryCallKit::inline_profileBranch(). Changes in macro.cpp & compile.cpp are leftovers from the
> version when Opaque4 was macro node. I plan to remove them.

I see, this is why you did not inherited it. Okay. I would suggest to leave an assert in compile.cpp to make sure it is 
not left.

I found typo when looked today (should be '&&'):

+ Node *Opaque4Node::Ideal(PhaseGVN *phase, bool can_reshape) {
+   if (can_reshape & _delay_removal) {

Thanks,
Vladimir

>
> Best regards,
> Vladimir Ivanov
>
>> On 1/16/15 9:16 AM, Vladimir Ivanov wrote:
>>> http://cr.openjdk.java.net/~vlivanov/8063137/webrev.00/hotspot/
>>> http://cr.openjdk.java.net/~vlivanov/8063137/webrev.00/jdk/
>>> https://bugs.openjdk.java.net/browse/JDK-8063137
>>>
>>> After GuardWithTest (GWT) LambdaForms became shared, profile pollution
>>> significantly distorted compilation decisions. It affected inlining and
>>> hindered some optimizations. It causes significant performance
>>> regressions for Nashorn (on Octane benchmarks).
>>>
>>> Inlining was fixed by 8059877 [1], but it didn't cover the case when a
>>> branch is never taken. It can cause missed optimization opportunity, and
>>> not just increase in code size. For example, non-pruned branch can break
>>> escape analysis.
>>>
>>> Currently, there are 2 problems:
>>>    - branch frequencies profile pollution
>>>    - deoptimization counts pollution
>>>
>>> Branch frequency pollution hides from JIT the fact that a branch is
>>> never taken. Since GWT LambdaForms (and hence their bytecode) are
>>> heavily shared, but the behavior is specific to MethodHandle, there's no
>>> way for JIT to understand how particular GWT instance behaves.
>>>
>>> The solution I propose is to do profiling in Java code and feed it to
>>> JIT. Every GWT MethodHandle holds an auxiliary array (int[2]) where
>>> profiling info is stored. Once JIT kicks in, it can retrieve these
>>> counts, if corresponding MethodHandle is a compile-time constant (and it
>>> is usually the case). To communicate the profile data from Java code to
>>> JIT, MethodHandleImpl::profileBranch() is used.
>>>
>>> If GWT MethodHandle isn't a compile-time constant, profiling should
>>> proceed. It happens when corresponding LambdaForm is already shared, for
>>> newly created GWT MethodHandles profiling can occur only in native code
>>> (dedicated nmethod for a single LambdaForm). So, when compilation of the
>>> whole MethodHandle chain is triggered, the profile should be already
>>> gathered.
>>>
>>> Overriding branch frequencies is not enough. Statistics on
>>> deoptimization events is also polluted. Even if a branch is never taken,
>>> JIT doesn't issue an uncommon trap there unless corresponding bytecode
>>> doesn't trap too much and doesn't cause too many recompiles.
>>>
>>> I added @IgnoreProfile and place it only on GWT LambdaForms. When JIT
>>> sees it on some method, Compile::too_many_traps &
>>> Compile::too_many_recompiles for that method always return false. It
>>> allows JIT to prune the branch based on custom profile and recompile the
>>> method, if the branch is visited.
>>>
>>> For now, I wanted to keep the fix very focused. The next thing I plan to
>>> do is to experiment with ignoring deoptimization counts for other
>>> LambdaForms which are heavily shared. I already saw problems caused by
>>> deoptimization counts pollution (see JDK-8068915 [2]).
>>>
>>> I plan to backport the fix into 8u40, once I finish extensive
>>> performance testing.
>>>
>>> Testing: JPRT, java/lang/invoke tests, nashorn (nashorn testsuite,
>>> Octane).
>>>
>>> Thanks!
>>>
>>> PS: as a summary, my experiments show that fixes for 8063137 & 8068915
>>> [2] almost completely recovers peak performance after LambdaForm sharing
>>> [3]. There's one more problem left (non-inlined MethodHandle invocations
>>> are more expensive when LFs are shared), but it's a story for another
>>> day.
>>>
>>> Best regards,
>>> Vladimir Ivanov
>>>
>>> [1] https://bugs.openjdk.java.net/browse/JDK-8059877
>>>      8059877: GWT branch frequencies pollution due to LF sharing
>>> [2] https://bugs.openjdk.java.net/browse/JDK-8068915
>>> [3] https://bugs.openjdk.java.net/browse/JDK-8046703
>>>      JEP 210: LambdaForm Reduction and Caching
>>> _______________________________________________
>>> mlvm-dev mailing list
>>> mlvm-dev at openjdk.java.net
>>> http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev
>> _______________________________________________
>> mlvm-dev mailing list
>> mlvm-dev at openjdk.java.net
>> http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev


More information about the hotspot-compiler-dev mailing list