[9] [8u40] RFR (M): 8059877: GWT branch frequencies pollution due to LF sharing

Fri Oct 10 21:20:49 UTC 2014

On 10/10/2014 10:42 PM, Vladimir Ivanov wrote:
> Remi,
>
>> Why do you need getHistoricInt ?
>> Is it because Unsafe.getInt() doesn't do any constant folding ?
> Exactly. I need a compile-time constant to feed it to the compiler to 
> guide compilation.
>
>> BTW, why getHistoricInt is named getHistoricInt ?
> From application perspective, the call returns current or some of the 
> previous values a field has.

thanks for your answer,
I have another question about inOptimizer(),
thinkint a little about it, if there is a code like
    if (unsafe.inOptimizer()) {
      ...
    }

this code will always trigger a recompilation, at least once, because in 
the interpreter, the branch will never be evaluated and in the JIT,
because inOptimizer will be rue, the JIT will insert a deopt instruction 
because the branch was never evaluated before.

I wonder if this recompilation can be avoided or not ?

>
> Best regards,
> Vladimir Ivanov

regards,
Rémi

>>
>> cheers,
>> Rémi
>>
>> On 10/10/2014 09:08 PM, Vladimir Ivanov wrote:
>>> http://cr.openjdk.java.net/~vlivanov/8059877/webrev.00/
>>> https://bugs.openjdk.java.net/browse/JDK-8059877
>>>
>>> LambdaForm sharing introduces profile pollution in compiled
>>> LambdaForms. The most serious consequence is inlining distortion,
>>> which severely degrade peak performance. The main victim is
>>> guardWithTest (GWT) combinator.
>>>
>>> Before LambdaForm sharing, inlining in GWT was affected by 2 aspects:
>>>   - branch frequencies: never-taken branch is skipped;
>>>   - target & fallback method handles (corresponding LFs: compiled vs
>>> interpreted): if method handle has been invoked < COMPILE_THRESHOLD
>>> times, LambdaForm.vmentry points to LF interpreter which is marked w/
>>> @DontInline.
>>>
>>> LambdaForm sharing breaks both aspects:
>>>   - sharing of GWT LambdaForm pollutes branch profile;
>>>   - sharing of LambdaForms used in target & fallback pollutes
>>> invocation counters.
>>>
>>> I experimented w/ VM API to guide JIT-compiler using profiling
>>> information gathered on LambdaForm level [1], but decided to take
>>> safer route for now (8u40). JIT-compiler control approach looks
>>> promising, but I need more time to get rid of some performance
>>> artifacts it suffers from.
>>>
>>> The proposed fix is to mimic behavior of fully customized LambdaForms.
>>> When GWT is created, both target & fallback method handles are wrapped
>>> in a special reinoker, which blocks inlining (@DontInline on
>>> reinvoker's compiled LambdaForm). Once a wrapper is invoked more that
>>> DONT_INLINE_THRESHOLD times, it's LambdaForm is replaced with a
>>> regular reinvoker, which is fully transparent for the JIT and it
>>> inlines smoothly.
>>>
>>> The downside of the chosen approach is that LambdaForm.updateForm()
>>> doesn't guarantee that all places where stale LambaForm is used see
>>> the update. If it is already part of some nmethod, it won't be
>>> invalidated and recompiled, but will be kept as is. It shouldn't be a
>>> problem, since DONT_INLINE_THRESHOLD is expected to be pretty low
>>> (~30), so only very rarely executed branches are affected.
>>>
>>> The fix significantly improves peak performance w/ full LF sharing
>>> (USE_LF_EDITOR=true).
>>>
>>> Octane/nashorn results [2] for:
>>>   (1) USE_LF_EDITOR=false DONT_INLINE_THRESHOLD=0 (default for 8u40&9)
>>>   (2) USE_LF_EDITOR=true  DONT_INLINE_THRESHOLD=0 (default for 8u40&9)
>>>   (3) USE_LF_EDITOR=true  DONT_INLINE_THRESHOLD=30 (fixed version)
>>>
>>> (1) & (2) correspond to default configurations (partial & full LF
>>> sharing respectively). (3) is the fixed version.
>>>
>>> The fix recovers peak performance for:
>>>  * Crypto:       ~255ms -> ~12ms;
>>>  * DeltaBlue:     ~40ms ->  ~2ms;
>>>  * Raytracer:     ~62ms ->  ~7ms;
>>>  * EarleyBoyer:  ~160ms ->  ~22ms;
>>>  * NavierStokes:  ~17ms ->  ~13ms;
>>>
>>> 2 subbenchmarks (Box2D & Gbemu) still has some regressions, but it's
>>> much better now:
>>>    Box2D: ~48ms -> ~61ms  (w/o the fix: ~155ms)
>>>    Gbemu: ~88ms -> ~116ms (w/o the fix: ~160ms)
>>>
>>> Testing:
>>>   tests: jck (api/java_lang/invoke), jdk/java/lang/invoke,
>>> jdk/java/util/streams, octane
>>>  configurations: -ea -esa -Xverify:all
>>>  + COMPILE_THRESHOLD={0,30} + USE_LF_EDITOR={false,true} +
>>> DONT_INLINE_THRESHOLD={0,30}
>>>
>>> Thanks!
>>>
>>> Best regards,
>>> Vladimir Ivanov
>>>
>>> [1] http://cr.openjdk.java.net/~vlivanov/profiling/
>>> [2] http://cr.openjdk.java.net/~vlivanov/8059877/octane.txt
>>> _______________________________________________
>>> mlvm-dev mailing list
>>> mlvm-dev at openjdk.java.net
>>> http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev
>>