[9] [8u40] RFR (M): 8059877: GWT branch frequencies pollution due to LF sharing
Remi Forax
forax at univ-mlv.fr
Fri Oct 10 21:20:49 UTC 2014
On 10/10/2014 10:42 PM, Vladimir Ivanov wrote:
> Remi,
>
>> Why do you need getHistoricInt ?
>> Is it because Unsafe.getInt() doesn't do any constant folding ?
> Exactly. I need a compile-time constant to feed it to the compiler to
> guide compilation.
>
>> BTW, why getHistoricInt is named getHistoricInt ?
> From application perspective, the call returns current or some of the
> previous values a field has.
thanks for your answer,
I have another question about inOptimizer(),
thinkint a little about it, if there is a code like
if (unsafe.inOptimizer()) {
...
}
this code will always trigger a recompilation, at least once, because in
the interpreter, the branch will never be evaluated and in the JIT,
because inOptimizer will be rue, the JIT will insert a deopt instruction
because the branch was never evaluated before.
I wonder if this recompilation can be avoided or not ?
>
> Best regards,
> Vladimir Ivanov
regards,
Rémi
>>
>> cheers,
>> Rémi
>>
>> On 10/10/2014 09:08 PM, Vladimir Ivanov wrote:
>>> http://cr.openjdk.java.net/~vlivanov/8059877/webrev.00/
>>> https://bugs.openjdk.java.net/browse/JDK-8059877
>>>
>>> LambdaForm sharing introduces profile pollution in compiled
>>> LambdaForms. The most serious consequence is inlining distortion,
>>> which severely degrade peak performance. The main victim is
>>> guardWithTest (GWT) combinator.
>>>
>>> Before LambdaForm sharing, inlining in GWT was affected by 2 aspects:
>>> - branch frequencies: never-taken branch is skipped;
>>> - target & fallback method handles (corresponding LFs: compiled vs
>>> interpreted): if method handle has been invoked < COMPILE_THRESHOLD
>>> times, LambdaForm.vmentry points to LF interpreter which is marked w/
>>> @DontInline.
>>>
>>> LambdaForm sharing breaks both aspects:
>>> - sharing of GWT LambdaForm pollutes branch profile;
>>> - sharing of LambdaForms used in target & fallback pollutes
>>> invocation counters.
>>>
>>> I experimented w/ VM API to guide JIT-compiler using profiling
>>> information gathered on LambdaForm level [1], but decided to take
>>> safer route for now (8u40). JIT-compiler control approach looks
>>> promising, but I need more time to get rid of some performance
>>> artifacts it suffers from.
>>>
>>> The proposed fix is to mimic behavior of fully customized LambdaForms.
>>> When GWT is created, both target & fallback method handles are wrapped
>>> in a special reinoker, which blocks inlining (@DontInline on
>>> reinvoker's compiled LambdaForm). Once a wrapper is invoked more that
>>> DONT_INLINE_THRESHOLD times, it's LambdaForm is replaced with a
>>> regular reinvoker, which is fully transparent for the JIT and it
>>> inlines smoothly.
>>>
>>> The downside of the chosen approach is that LambdaForm.updateForm()
>>> doesn't guarantee that all places where stale LambaForm is used see
>>> the update. If it is already part of some nmethod, it won't be
>>> invalidated and recompiled, but will be kept as is. It shouldn't be a
>>> problem, since DONT_INLINE_THRESHOLD is expected to be pretty low
>>> (~30), so only very rarely executed branches are affected.
>>>
>>> The fix significantly improves peak performance w/ full LF sharing
>>> (USE_LF_EDITOR=true).
>>>
>>> Octane/nashorn results [2] for:
>>> (1) USE_LF_EDITOR=false DONT_INLINE_THRESHOLD=0 (default for 8u40&9)
>>> (2) USE_LF_EDITOR=true DONT_INLINE_THRESHOLD=0 (default for 8u40&9)
>>> (3) USE_LF_EDITOR=true DONT_INLINE_THRESHOLD=30 (fixed version)
>>>
>>> (1) & (2) correspond to default configurations (partial & full LF
>>> sharing respectively). (3) is the fixed version.
>>>
>>> The fix recovers peak performance for:
>>> * Crypto: ~255ms -> ~12ms;
>>> * DeltaBlue: ~40ms -> ~2ms;
>>> * Raytracer: ~62ms -> ~7ms;
>>> * EarleyBoyer: ~160ms -> ~22ms;
>>> * NavierStokes: ~17ms -> ~13ms;
>>>
>>> 2 subbenchmarks (Box2D & Gbemu) still has some regressions, but it's
>>> much better now:
>>> Box2D: ~48ms -> ~61ms (w/o the fix: ~155ms)
>>> Gbemu: ~88ms -> ~116ms (w/o the fix: ~160ms)
>>>
>>> Testing:
>>> tests: jck (api/java_lang/invoke), jdk/java/lang/invoke,
>>> jdk/java/util/streams, octane
>>> configurations: -ea -esa -Xverify:all
>>> + COMPILE_THRESHOLD={0,30} + USE_LF_EDITOR={false,true} +
>>> DONT_INLINE_THRESHOLD={0,30}
>>>
>>> Thanks!
>>>
>>> Best regards,
>>> Vladimir Ivanov
>>>
>>> [1] http://cr.openjdk.java.net/~vlivanov/profiling/
>>> [2] http://cr.openjdk.java.net/~vlivanov/8059877/octane.txt
>>> _______________________________________________
>>> mlvm-dev mailing list
>>> mlvm-dev at openjdk.java.net
>>> http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev
>>
More information about the mlvm-dev
mailing list