RFC: Recuperate slowdown of long-running static initialization
Vladimir Ivanov
vladimir.x.ivanov at oracle.com
Fri Apr 19 23:19:23 UTC 2019
Hi,
Recent changes severely affected how static intializers are executed and
for long-running initializers it manifested as a severe slowdown. As an
example, it manifests as 3x slowdown for some Clojure applications
(JDK-8219233 [1]).
The root cause is that until a class is fully initialized, every
invocation of static method on it goes through method resolution.
There were some changes (JDK-8219974 [2]) to partially recuperate the
slowdown, but they had limited effect.
I have been experimenting with a comprehensive fix and ended up with the
following:
http://cr.openjdk.java.net/~vlivanov/8219233/webrev.02/
(Unfortunately, I had to go with platform-specific changes and the patch
contains only x86_64 part. On other platforms original behavior is
preserved.)
The idea is to put initialization barrier on entry into static methods.
If wrong thread enters it, the thread is blocked until class
initialization is finished (and exception is thrown if initialization
finishes with an error).
The barrier is as simple as:
if (holder->is_not_initialized() &&
holder->is_reentrant_initialization(current_thread)) {
// trigger call site re-resolution and block there
}
Performance experiments demonstrated that even through generated code
contributes the most overhead, interpreter overhead is visible as well
(~20%).
(1) original (always reresolve): ~12,0s ( 1x)
(2) C1/C2 - barriers; int - reresolve: ~3,8s (~3x)
(3) int/C1/C2 - barriers: ~3,2s (-20%)
Based on that, I decided to implement barriers both in JIT-compilers
(C1/C2) & interpreter.
For C1/C2 I made a decision to put the barrier at callee side (in
nmethod prologue). Though it looks attractive to put it on caller side
(before the call), it poses major implementation challenges for C1 where
unresolved calls are eagerly compiled.
For interpreter, on the other hand, it's much simpler to implement the
barrier: throwing an exception on method entry is much more complicated
than doing that as part of method resolution during the call.
So, here's the correspondence between barriers and transitions they cover:
(1) from interpreter (barrier on caller side)
* all transitions: interpreter, compiled (i2c), native, aot, ...
(2) from compiled (barrier on callee side)
to compiled, to native (barrier in native wrapper on entry)
(3) c2i bypasses both barriers (interpreter and compiled) and
requires a dedicated barrier in c2i
(4) to Graal/AOT:
from interpreter: covered by interpreter barrier
from compiled: current patch doesn't cover Graal and AOT, so
call site patching is disabled for them leading to repeated call site
resolution until method holder is fully initialized.
I'd like to hear opinions about the patch and decisions I made before
publishing it for review. For example, is it worth to change template
interpreter? The change itself is small and localized, and performance
improvement is noticeable, but still it resides in platform-specific code.
Regarding the implementation of barriers in generated code, nmethod
entry barriers (introduced by 8210498 [3]) look like a perfect fit (and
I even experimented with them), but I decided to leave it aside for now:
mainly to ease backports (8210498 was introduced in 12), but also to
ease support on other platforms (as of now, nmethod entry barriers are
supported solely on x86_64). As a followup work, the implementations can
be unified in 13/12u.
Entry barriers support in Graal/AOT is left for future work as well.
Once the support is there, call site patching restrictions should be
relaxed.
Thanks!
Best regards,
Vladimir Ivanov
[1] https://bugs.openjdk.java.net/browse/JDK-8219233
[2] https://bugs.openjdk.java.net/browse/JDK-8219974
[3] https://bugs.openjdk.java.net/browse/JDK-8210498
More information about the hotspot-compiler-dev
mailing list