[13] RFR (M): 8223213: Implement fast class initialization checks on x86-64
Vladimir Kozlov
vladimir.kozlov at oracle.com
Thu May 2 00:40:05 UTC 2019
Why you skip patching code compiled by Graal and AOT?
The flag UseFastClassInitChecks could be diagnostic or even product. The feature is not for debugging.
Thanks,
Vladimir K
On 5/1/19 4:17 PM, Vladimir Ivanov wrote:
> http://cr.openjdk.java.net/~vlivanov/8223213/webrev.00/
> https://bugs.openjdk.java.net/browse/JDK-8223213
>
> (It's a followup RFR on a earlier RFC [1].)
>
> Recent changes severely affected how static initializers are executed and for long-running initializers it manifested as
> a severe slowdown.
> As an example, it led to a 3x slowdown on some Clojure applications
> (JDK-8219233 [2]). The root cause is that until a class is fully initialized, every invocation of static method on it
> goes through method resolution.
>
> Proposed fix introduces fast class initialization barriers for C1, C2, and template interpreter on x86-64. I did some
> experiments with cross-platform approaches, but haven't got satisfactory results.
>
> On other platforms, behavior stays (mostly) intact. (I had to revert some changes introduced by JDK-8219492 [3], since
> the assumptions they rely on about accesses inside a class don't hold in all cases.)
>
> The barrier is as simple as:
> if (holder->is_not_initialized() &&
> !holder->is_reentrant_initialization(current_thread)) {
> // trigger call site re-resolution and block there
> }
>
> There are 3 places where barriers are added:
> * in template interpreter for invokestatic bytecode;
> * at nmethod verified entry point (for normal compilations);
> * c2i adapters;
>
> For template interperter, there's additional check added into TemplateTable::resolve_cache_and_index which calls into
> InterpreterRuntime::resolve_from_cache when fast path checks fail.
>
> In case of nmethods, the barrier is put before frame construction, so existing compiler runtime routines can be reused
> (SharedRuntime::get_handle_wrong_method_stub()).
>
> Also, C2 has a guard on entry (Parse::clinit_deopt()) which triggers nmethod recompilation once the class is fully
> initialized.
>
> OSR compilations don't need a barrier.
>
> Correspondence between barriers and transitions they cover:
> (1) from interpreter (barrier on caller side)
> * all transitions: interpreter, compiled (i2c), native, aot, ...
>
> (2) from compiled (barrier on callee side)
> to compiled, to native (barrier in native wrapper on entry)
>
> (3) c2i bypasses both barriers (interpreter and compiled) and requires a dedicated barrier in c2i
>
> (4) to Graal/AOT code:
> from interpreter: covered by interpreter barrier
> from compiled: call site patching is disabled, leading to repeated call site resolution until method holder is
> fully initialized (original behavior).
>
> Performance experiments with clojure [2] demonstrated that the fix almost completely recuperates the regression:
>
> (1) always reresolve (w/o the fix): ~12,0s ( 1x)
> (2) C1/C2 barriers only: ~3,8s (~3x)
> (3) int/C1/C2 barriers: ~3,2s (-20%)
> --------
> (4) barriers disabled for invokestatic ~3,2s
>
> I deliberately tried to keep the patch backport-friendly for 8u/11u/12u and refrained from using newer features like
> nmethod barriers introduced recently. The fix can be refactored later specifically for 13 as a followup change.
>
> Testing: clojure startup, tier1-5
>
> Thanks!
>
> Best regards,
> Vladimir Ivanov
>
> [1] https://mail.openjdk.java.net/pipermail/hotspot-dev/2019-April/037760.html
> [2] https://bugs.openjdk.java.net/browse/JDK-8219233
> [3] https://bugs.openjdk.java.net/browse/JDK-8219492
More information about the hotspot-runtime-dev
mailing list