[9] [8u40] RFR (M): 8059877: GWT branch frequencies pollution due to LF sharing
Vladimir Ivanov
vladimir.x.ivanov at oracle.com
Fri Oct 10 19:08:00 UTC 2014
http://cr.openjdk.java.net/~vlivanov/8059877/webrev.00/
https://bugs.openjdk.java.net/browse/JDK-8059877
LambdaForm sharing introduces profile pollution in compiled LambdaForms.
The most serious consequence is inlining distortion, which severely
degrade peak performance. The main victim is guardWithTest (GWT)
combinator.
Before LambdaForm sharing, inlining in GWT was affected by 2 aspects:
- branch frequencies: never-taken branch is skipped;
- target & fallback method handles (corresponding LFs: compiled vs
interpreted): if method handle has been invoked < COMPILE_THRESHOLD
times, LambdaForm.vmentry points to LF interpreter which is marked w/
@DontInline.
LambdaForm sharing breaks both aspects:
- sharing of GWT LambdaForm pollutes branch profile;
- sharing of LambdaForms used in target & fallback pollutes
invocation counters.
I experimented w/ VM API to guide JIT-compiler using profiling
information gathered on LambdaForm level [1], but decided to take safer
route for now (8u40). JIT-compiler control approach looks promising, but
I need more time to get rid of some performance artifacts it suffers
from.
The proposed fix is to mimic behavior of fully customized LambdaForms.
When GWT is created, both target & fallback method handles are wrapped
in a special reinoker, which blocks inlining (@DontInline on reinvoker's
compiled LambdaForm). Once a wrapper is invoked more that
DONT_INLINE_THRESHOLD times, it's LambdaForm is replaced with a regular
reinvoker, which is fully transparent for the JIT and it inlines smoothly.
The downside of the chosen approach is that LambdaForm.updateForm()
doesn't guarantee that all places where stale LambaForm is used see the
update. If it is already part of some nmethod, it won't be invalidated
and recompiled, but will be kept as is. It shouldn't be a problem, since
DONT_INLINE_THRESHOLD is expected to be pretty low (~30), so only very
rarely executed branches are affected.
The fix significantly improves peak performance w/ full LF sharing
(USE_LF_EDITOR=true).
Octane/nashorn results [2] for:
(1) USE_LF_EDITOR=false DONT_INLINE_THRESHOLD=0 (default for 8u40&9)
(2) USE_LF_EDITOR=true DONT_INLINE_THRESHOLD=0 (default for 8u40&9)
(3) USE_LF_EDITOR=true DONT_INLINE_THRESHOLD=30 (fixed version)
(1) & (2) correspond to default configurations (partial & full LF
sharing respectively). (3) is the fixed version.
The fix recovers peak performance for:
* Crypto: ~255ms -> ~12ms;
* DeltaBlue: ~40ms -> ~2ms;
* Raytracer: ~62ms -> ~7ms;
* EarleyBoyer: ~160ms -> ~22ms;
* NavierStokes: ~17ms -> ~13ms;
2 subbenchmarks (Box2D & Gbemu) still has some regressions, but it's
much better now:
Box2D: ~48ms -> ~61ms (w/o the fix: ~155ms)
Gbemu: ~88ms -> ~116ms (w/o the fix: ~160ms)
Testing:
tests: jck (api/java_lang/invoke), jdk/java/lang/invoke,
jdk/java/util/streams, octane
configurations: -ea -esa -Xverify:all
+ COMPILE_THRESHOLD={0,30} + USE_LF_EDITOR={false,true} +
DONT_INLINE_THRESHOLD={0,30}
Thanks!
Best regards,
Vladimir Ivanov
[1] http://cr.openjdk.java.net/~vlivanov/profiling/
[2] http://cr.openjdk.java.net/~vlivanov/8059877/octane.txt
More information about the core-libs-dev
mailing list