[9] RFR (M): 8063137: Never-taken branches should be pruned when GWT LambdaForms are shared

Vladimir Ivanov vladimir.x.ivanov at oracle.com
Fri Jan 16 17:16:22 UTC 2015


http://cr.openjdk.java.net/~vlivanov/8063137/webrev.00/hotspot/
http://cr.openjdk.java.net/~vlivanov/8063137/webrev.00/jdk/
https://bugs.openjdk.java.net/browse/JDK-8063137

After GuardWithTest (GWT) LambdaForms became shared, profile pollution 
significantly distorted compilation decisions. It affected inlining and 
hindered some optimizations. It causes significant performance 
regressions for Nashorn (on Octane benchmarks).

Inlining was fixed by 8059877 [1], but it didn't cover the case when a 
branch is never taken. It can cause missed optimization opportunity, and 
not just increase in code size. For example, non-pruned branch can break 
escape analysis.

Currently, there are 2 problems:
   - branch frequencies profile pollution
   - deoptimization counts pollution

Branch frequency pollution hides from JIT the fact that a branch is 
never taken. Since GWT LambdaForms (and hence their bytecode) are 
heavily shared, but the behavior is specific to MethodHandle, there's no 
way for JIT to understand how particular GWT instance behaves.

The solution I propose is to do profiling in Java code and feed it to 
JIT. Every GWT MethodHandle holds an auxiliary array (int[2]) where 
profiling info is stored. Once JIT kicks in, it can retrieve these 
counts, if corresponding MethodHandle is a compile-time constant (and it 
is usually the case). To communicate the profile data from Java code to 
JIT, MethodHandleImpl::profileBranch() is used.

If GWT MethodHandle isn't a compile-time constant, profiling should 
proceed. It happens when corresponding LambdaForm is already shared, for 
newly created GWT MethodHandles profiling can occur only in native code 
(dedicated nmethod for a single LambdaForm). So, when compilation of the 
whole MethodHandle chain is triggered, the profile should be already 
gathered.

Overriding branch frequencies is not enough. Statistics on 
deoptimization events is also polluted. Even if a branch is never taken, 
JIT doesn't issue an uncommon trap there unless corresponding bytecode 
doesn't trap too much and doesn't cause too many recompiles.

I added @IgnoreProfile and place it only on GWT LambdaForms. When JIT 
sees it on some method, Compile::too_many_traps & 
Compile::too_many_recompiles for that method always return false. It 
allows JIT to prune the branch based on custom profile and recompile the 
method, if the branch is visited.

For now, I wanted to keep the fix very focused. The next thing I plan to 
do is to experiment with ignoring deoptimization counts for other 
LambdaForms which are heavily shared. I already saw problems caused by 
deoptimization counts pollution (see JDK-8068915 [2]).

I plan to backport the fix into 8u40, once I finish extensive 
performance testing.

Testing: JPRT, java/lang/invoke tests, nashorn (nashorn testsuite, Octane).

Thanks!

PS: as a summary, my experiments show that fixes for 8063137 & 8068915 
[2] almost completely recovers peak performance after LambdaForm sharing 
[3]. There's one more problem left (non-inlined MethodHandle invocations 
are more expensive when LFs are shared), but it's a story for another day.

Best regards,
Vladimir Ivanov

[1] https://bugs.openjdk.java.net/browse/JDK-8059877
     8059877: GWT branch frequencies pollution due to LF sharing
[2] https://bugs.openjdk.java.net/browse/JDK-8068915
[3] https://bugs.openjdk.java.net/browse/JDK-8046703
     JEP 210: LambdaForm Reduction and Caching



More information about the core-libs-dev mailing list