RFC: Recuperate slowdown of long-running static initialization

Fri Apr 19 23:19:23 UTC 2019

Hi,

Recent changes severely affected how static intializers are executed and 
for long-running initializers it manifested as a severe slowdown. As an 
example, it manifests as 3x slowdown for some Clojure applications 
(JDK-8219233 [1]).

The root cause is that until a class is fully initialized, every 
invocation of static method on it goes through method resolution.
There were some changes (JDK-8219974 [2]) to partially recuperate the 
slowdown, but they had limited effect.

I have been experimenting with a comprehensive fix and ended up with the 
following:

   http://cr.openjdk.java.net/~vlivanov/8219233/webrev.02/

(Unfortunately, I had to go with platform-specific changes and the patch 
contains only x86_64 part. On other platforms original behavior is 
preserved.)

The idea is to put initialization barrier on entry into static methods. 
If wrong thread enters it, the thread is blocked until class 
initialization is finished (and exception is thrown if initialization 
finishes with an error).

The barrier is as simple as:

    if (holder->is_not_initialized() &&
        holder->is_reentrant_initialization(current_thread)) {
      // trigger call site re-resolution and block there
    }

Performance experiments demonstrated that even through generated code 
contributes the most overhead, interpreter overhead is visible as well 
(~20%).
   (1) original (always reresolve):       ~12,0s ( 1x)
   (2) C1/C2 - barriers; int - reresolve:  ~3,8s (~3x)
   (3) int/C1/C2 - barriers:               ~3,2s (-20%)

Based on that, I decided to implement barriers both in JIT-compilers 
(C1/C2) & interpreter.

For C1/C2 I made a decision to put the barrier at callee side (in 
nmethod prologue). Though it looks attractive to put it on caller side 
(before the call), it poses major implementation challenges for C1 where 
unresolved calls are eagerly compiled.

For interpreter, on the other hand, it's much simpler to implement the 
barrier: throwing an exception on method entry is much more complicated 
than doing that as part of method resolution during the call.

So, here's the correspondence between barriers and transitions they cover:

   (1) from interpreter (barrier on caller side)
        * all transitions: interpreter, compiled (i2c), native, aot, ...

   (2) from compiled (barrier on callee side)
        to compiled, to native (barrier in native wrapper on entry)

   (3) c2i bypasses both barriers (interpreter and compiled) and 
requires a dedicated barrier in c2i

   (4) to Graal/AOT:
         from interpreter: covered by interpreter barrier
         from compiled: current patch doesn't cover Graal and AOT, so 
call site patching is disabled for them leading to repeated call site 
resolution until method holder is fully initialized.

I'd like to hear opinions about the patch and decisions I made before 
publishing it for review. For example, is it worth to change template 
interpreter? The change itself is small and localized, and performance 
improvement is noticeable, but still it resides in platform-specific code.

Regarding the implementation of barriers in generated code, nmethod 
entry barriers (introduced by 8210498 [3]) look like a perfect fit (and 
I even experimented with them), but I decided to leave it aside for now: 
mainly to ease backports (8210498 was introduced in 12), but also to 
ease support on other platforms (as of now, nmethod entry barriers are 
supported solely on x86_64). As a followup work, the implementations can 
be unified in 13/12u.

Entry barriers support in Graal/AOT is left for future work as well. 
Once the support is there, call site patching restrictions should be 
relaxed.

Thanks!

Best regards,
Vladimir Ivanov

[1] https://bugs.openjdk.java.net/browse/JDK-8219233

[2] https://bugs.openjdk.java.net/browse/JDK-8219974

[3] https://bugs.openjdk.java.net/browse/JDK-8210498