[13] RFR (M): 8223213: Implement fast class initialization checks on x86-64
Vladimir Ivanov
vladimir.x.ivanov at oracle.com
Wed May 1 23:17:17 UTC 2019
http://cr.openjdk.java.net/~vlivanov/8223213/webrev.00/
https://bugs.openjdk.java.net/browse/JDK-8223213
(It's a followup RFR on a earlier RFC [1].)
Recent changes severely affected how static initializers are executed
and for long-running initializers it manifested as a severe slowdown.
As an example, it led to a 3x slowdown on some Clojure applications
(JDK-8219233 [2]). The root cause is that until a class is fully
initialized, every invocation of static method on it goes through method
resolution.
Proposed fix introduces fast class initialization barriers for C1, C2,
and template interpreter on x86-64. I did some experiments with
cross-platform approaches, but haven't got satisfactory results.
On other platforms, behavior stays (mostly) intact. (I had to revert
some changes introduced by JDK-8219492 [3], since the assumptions they
rely on about accesses inside a class don't hold in all cases.)
The barrier is as simple as:
if (holder->is_not_initialized() &&
!holder->is_reentrant_initialization(current_thread)) {
// trigger call site re-resolution and block there
}
There are 3 places where barriers are added:
* in template interpreter for invokestatic bytecode;
* at nmethod verified entry point (for normal compilations);
* c2i adapters;
For template interperter, there's additional check added into
TemplateTable::resolve_cache_and_index which calls into
InterpreterRuntime::resolve_from_cache when fast path checks fail.
In case of nmethods, the barrier is put before frame construction, so
existing compiler runtime routines can be reused
(SharedRuntime::get_handle_wrong_method_stub()).
Also, C2 has a guard on entry (Parse::clinit_deopt()) which triggers
nmethod recompilation once the class is fully initialized.
OSR compilations don't need a barrier.
Correspondence between barriers and transitions they cover:
(1) from interpreter (barrier on caller side)
* all transitions: interpreter, compiled (i2c), native, aot, ...
(2) from compiled (barrier on callee side)
to compiled, to native (barrier in native wrapper on entry)
(3) c2i bypasses both barriers (interpreter and compiled) and
requires a dedicated barrier in c2i
(4) to Graal/AOT code:
from interpreter: covered by interpreter barrier
from compiled: call site patching is disabled, leading to
repeated call site resolution until method holder is fully initialized
(original behavior).
Performance experiments with clojure [2] demonstrated that the fix
almost completely recuperates the regression:
(1) always reresolve (w/o the fix): ~12,0s ( 1x)
(2) C1/C2 barriers only: ~3,8s (~3x)
(3) int/C1/C2 barriers: ~3,2s (-20%)
--------
(4) barriers disabled for invokestatic ~3,2s
I deliberately tried to keep the patch backport-friendly for 8u/11u/12u
and refrained from using newer features like nmethod barriers introduced
recently. The fix can be refactored later specifically for 13 as a
followup change.
Testing: clojure startup, tier1-5
Thanks!
Best regards,
Vladimir Ivanov
[1]
https://mail.openjdk.java.net/pipermail/hotspot-dev/2019-April/037760.html
[2] https://bugs.openjdk.java.net/browse/JDK-8219233
[3] https://bugs.openjdk.java.net/browse/JDK-8219492
More information about the hotspot-runtime-dev
mailing list