[13] RFR (M): 8223213: Implement fast class initialization checks on x86-64

Vladimir Ivanov vladimir.x.ivanov at oracle.com
Wed May 1 23:17:17 UTC 2019


http://cr.openjdk.java.net/~vlivanov/8223213/webrev.00/
https://bugs.openjdk.java.net/browse/JDK-8223213

(It's a followup RFR on a earlier RFC [1].)

Recent changes severely affected how static initializers are executed 
and for long-running initializers it manifested as a severe slowdown.
As an example, it led to a 3x slowdown on some Clojure applications
(JDK-8219233 [2]). The root cause is that until a class is fully 
initialized, every invocation of static method on it goes through method 
resolution.

Proposed fix introduces fast class initialization barriers for C1, C2, 
and template interpreter on x86-64. I did some experiments with 
cross-platform approaches, but haven't got satisfactory results.

On other platforms, behavior stays (mostly) intact. (I had to revert 
some changes introduced by JDK-8219492 [3], since the assumptions they 
rely on about accesses inside a class don't hold in all cases.)

The barrier is as simple as:
    if (holder->is_not_initialized() &&
        !holder->is_reentrant_initialization(current_thread)) {
      // trigger call site re-resolution and block there
    }

There are 3 places where barriers are added:
   * in template interpreter for invokestatic bytecode;
   * at nmethod verified entry point (for normal compilations);
   * c2i adapters;

For template interperter, there's additional check added into 
TemplateTable::resolve_cache_and_index which calls into 
InterpreterRuntime::resolve_from_cache when fast path checks fail.

In case of nmethods, the barrier is put before frame construction, so 
existing compiler runtime routines can be reused 
(SharedRuntime::get_handle_wrong_method_stub()).

Also, C2 has a guard on entry (Parse::clinit_deopt()) which triggers 
nmethod recompilation once the class is fully initialized.

OSR compilations don't need a barrier.

Correspondence between barriers and transitions they cover:
   (1) from interpreter (barrier on caller side)
        * all transitions: interpreter, compiled (i2c), native, aot, ...

   (2) from compiled (barrier on callee side)
        to compiled, to native (barrier in native wrapper on entry)

   (3) c2i bypasses both barriers (interpreter and compiled) and 
requires a dedicated barrier in c2i

   (4) to Graal/AOT code:
         from interpreter: covered by interpreter barrier
         from compiled: call site patching is disabled, leading to 
repeated call site resolution until method holder is fully initialized 
(original behavior).

Performance experiments with clojure [2] demonstrated that the fix 
almost completely recuperates the regression:

   (1) always reresolve (w/o the fix):    ~12,0s ( 1x)
   (2) C1/C2 barriers only:                ~3,8s (~3x)
   (3) int/C1/C2 barriers:                 ~3,2s (-20%)
--------
   (4) barriers disabled for invokestatic  ~3,2s

I deliberately tried to keep the patch backport-friendly for 8u/11u/12u 
and refrained from using newer features like nmethod barriers introduced 
recently. The fix can be refactored later specifically for 13 as a 
followup change.

Testing: clojure startup, tier1-5

Thanks!

Best regards,
Vladimir Ivanov

[1] 
https://mail.openjdk.java.net/pipermail/hotspot-dev/2019-April/037760.html
[2] https://bugs.openjdk.java.net/browse/JDK-8219233
[3] https://bugs.openjdk.java.net/browse/JDK-8219492


More information about the hotspot-compiler-dev mailing list