[9] RFR (M): 8063137: Never-taken branches should be pruned when GWT LambdaForms are shared
Vladimir Ivanov
vladimir.x.ivanov at oracle.com
Tue Jan 20 12:40:50 UTC 2015
Duncan, thanks a lot for giving it a try!
If you plan to spend more time on it, please, apply 8068915 as well. I
saw huge intermittent performance regressions due to continuous
deoptimization storm. You can look into -XX:+LogCompilation output and
look for repeated deoptimization events in steady state w/ Action_none.
Also, there's deoptimization statistics in the log (at least, in jdk9).
It's located right before compilation_log tag.
Thanks again for the valuable feedback!
Best regards,
Vladimir Ivanov
[1] http://cr.openjdk.java.net/~vlivanov/8068915/webrev.00
On 1/19/15 11:21 PM, MacGregor, Duncan (GE Energy Management) wrote:
> Okay, I¹ve done some tests of this with the micro benchmarks for our
> language & runtime which show pretty much no change except for one test
> which is now almost 3x slower. It uses nested loops to iterate over an
> array and concatenate the string-like objects it contains, and replaces
> elements with these new longer string-llike objects. It¹s a bit of a
> pathological case, and I haven¹t seen the same sort of degradation in the
> other benchmarks or in real applications, but I haven¹t done serious
> benchmarking of them with this change.
>
> I shall see if the test case can be reduced down to anything simpler while
> still showing the same performance behaviour, and try add some compilation
> logging options to narrow down what¹s going on.
>
> Duncan.
>
> On 16/01/2015 17:16, "Vladimir Ivanov" <vladimir.x.ivanov at oracle.com>
> wrote:
>
>> http://cr.openjdk.java.net/~vlivanov/8063137/webrev.00/hotspot/
>> http://cr.openjdk.java.net/~vlivanov/8063137/webrev.00/jdk/
>> https://bugs.openjdk.java.net/browse/JDK-8063137
>>
>> After GuardWithTest (GWT) LambdaForms became shared, profile pollution
>> significantly distorted compilation decisions. It affected inlining and
>> hindered some optimizations. It causes significant performance
>> regressions for Nashorn (on Octane benchmarks).
>>
>> Inlining was fixed by 8059877 [1], but it didn't cover the case when a
>> branch is never taken. It can cause missed optimization opportunity, and
>> not just increase in code size. For example, non-pruned branch can break
>> escape analysis.
>>
>> Currently, there are 2 problems:
>> - branch frequencies profile pollution
>> - deoptimization counts pollution
>>
>> Branch frequency pollution hides from JIT the fact that a branch is
>> never taken. Since GWT LambdaForms (and hence their bytecode) are
>> heavily shared, but the behavior is specific to MethodHandle, there's no
>> way for JIT to understand how particular GWT instance behaves.
>>
>> The solution I propose is to do profiling in Java code and feed it to
>> JIT. Every GWT MethodHandle holds an auxiliary array (int[2]) where
>> profiling info is stored. Once JIT kicks in, it can retrieve these
>> counts, if corresponding MethodHandle is a compile-time constant (and it
>> is usually the case). To communicate the profile data from Java code to
>> JIT, MethodHandleImpl::profileBranch() is used.
>>
>> If GWT MethodHandle isn't a compile-time constant, profiling should
>> proceed. It happens when corresponding LambdaForm is already shared, for
>> newly created GWT MethodHandles profiling can occur only in native code
>> (dedicated nmethod for a single LambdaForm). So, when compilation of the
>> whole MethodHandle chain is triggered, the profile should be already
>> gathered.
>>
>> Overriding branch frequencies is not enough. Statistics on
>> deoptimization events is also polluted. Even if a branch is never taken,
>> JIT doesn't issue an uncommon trap there unless corresponding bytecode
>> doesn't trap too much and doesn't cause too many recompiles.
>>
>> I added @IgnoreProfile and place it only on GWT LambdaForms. When JIT
>> sees it on some method, Compile::too_many_traps &
>> Compile::too_many_recompiles for that method always return false. It
>> allows JIT to prune the branch based on custom profile and recompile the
>> method, if the branch is visited.
>>
>> For now, I wanted to keep the fix very focused. The next thing I plan to
>> do is to experiment with ignoring deoptimization counts for other
>> LambdaForms which are heavily shared. I already saw problems caused by
>> deoptimization counts pollution (see JDK-8068915 [2]).
>>
>> I plan to backport the fix into 8u40, once I finish extensive
>> performance testing.
>>
>> Testing: JPRT, java/lang/invoke tests, nashorn (nashorn testsuite,
>> Octane).
>>
>> Thanks!
>>
>> PS: as a summary, my experiments show that fixes for 8063137 & 8068915
>> [2] almost completely recovers peak performance after LambdaForm sharing
>> [3]. There's one more problem left (non-inlined MethodHandle invocations
>> are more expensive when LFs are shared), but it's a story for another day.
>>
>> Best regards,
>> Vladimir Ivanov
>>
>> [1] https://bugs.openjdk.java.net/browse/JDK-8059877
>> 8059877: GWT branch frequencies pollution due to LF sharing
>> [2] https://bugs.openjdk.java.net/browse/JDK-8068915
>> [3] https://bugs.openjdk.java.net/browse/JDK-8046703
>> JEP 210: LambdaForm Reduction and Caching
>> _______________________________________________
>> mlvm-dev mailing list
>> mlvm-dev at openjdk.java.net
>> http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev
>
> _______________________________________________
> mlvm-dev mailing list
> mlvm-dev at openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev
>
More information about the mlvm-dev
mailing list