[13] RFR (M): 8223213: Implement fast class initialization checks on x86-64

Vladimir Ivanov vladimir.x.ivanov at oracle.com
Thu May 2 02:13:37 UTC 2019


Thanks for the feedback, Vladimir!

> Why you skip patching code compiled by Graal and AOT?

It happens only for classes being initialized and effectively preserve 
current behavior (re-resolution until class is fully initialized).

The motivation is the following:

   * Graal needs to put class init barriers in nmethods at verified entry
point in the same way C1/C2 does with this patch;

   * regarding AOTed code (I haven't done extensive exploration, but 
based on private discussions), I believe it needs additional barriers at 
method entry as well.

Once proper support lands in Graal or AOT, the patching can be re-enabled.

> The flag UseFastClassInitChecks could be diagnostic or even product. The 
> feature is not for debugging.
The flag is used to signal that platform-specific support is available. 
Unless there's a use case which benefits from ability to turning it off 
(disable new barriers and fallback to re-resolution) from command line, 
I don't see much value in turning the flag into diagnostic/product one.

Best regards,
Vladimir Ivanov

> On 5/1/19 4:17 PM, Vladimir Ivanov wrote:
>> http://cr.openjdk.java.net/~vlivanov/8223213/webrev.00/
>> https://bugs.openjdk.java.net/browse/JDK-8223213
>>
>> (It's a followup RFR on a earlier RFC [1].)
>>
>> Recent changes severely affected how static initializers are executed 
>> and for long-running initializers it manifested as a severe slowdown.
>> As an example, it led to a 3x slowdown on some Clojure applications
>> (JDK-8219233 [2]). The root cause is that until a class is fully 
>> initialized, every invocation of static method on it goes through 
>> method resolution.
>>
>> Proposed fix introduces fast class initialization barriers for C1, C2, 
>> and template interpreter on x86-64. I did some experiments with 
>> cross-platform approaches, but haven't got satisfactory results.
>>
>> On other platforms, behavior stays (mostly) intact. (I had to revert 
>> some changes introduced by JDK-8219492 [3], since the assumptions they 
>> rely on about accesses inside a class don't hold in all cases.)
>>
>> The barrier is as simple as:
>>     if (holder->is_not_initialized() &&
>>         !holder->is_reentrant_initialization(current_thread)) {
>>       // trigger call site re-resolution and block there
>>     }
>>
>> There are 3 places where barriers are added:
>>    * in template interpreter for invokestatic bytecode;
>>    * at nmethod verified entry point (for normal compilations);
>>    * c2i adapters;
>>
>> For template interperter, there's additional check added into 
>> TemplateTable::resolve_cache_and_index which calls into 
>> InterpreterRuntime::resolve_from_cache when fast path checks fail.
>>
>> In case of nmethods, the barrier is put before frame construction, so 
>> existing compiler runtime routines can be reused 
>> (SharedRuntime::get_handle_wrong_method_stub()).
>>
>> Also, C2 has a guard on entry (Parse::clinit_deopt()) which triggers 
>> nmethod recompilation once the class is fully initialized.
>>
>> OSR compilations don't need a barrier.
>>
>> Correspondence between barriers and transitions they cover:
>>    (1) from interpreter (barrier on caller side)
>>         * all transitions: interpreter, compiled (i2c), native, aot, ...
>>
>>    (2) from compiled (barrier on callee side)
>>         to compiled, to native (barrier in native wrapper on entry)
>>
>>    (3) c2i bypasses both barriers (interpreter and compiled) and 
>> requires a dedicated barrier in c2i
>>
>>    (4) to Graal/AOT code:
>>          from interpreter: covered by interpreter barrier
>>          from compiled: call site patching is disabled, leading to 
>> repeated call site resolution until method holder is fully initialized 
>> (original behavior).
>>
>> Performance experiments with clojure [2] demonstrated that the fix 
>> almost completely recuperates the regression:
>>
>>    (1) always reresolve (w/o the fix):    ~12,0s ( 1x)
>>    (2) C1/C2 barriers only:                ~3,8s (~3x)
>>    (3) int/C1/C2 barriers:                 ~3,2s (-20%)
>> --------
>>    (4) barriers disabled for invokestatic  ~3,2s
>>
>> I deliberately tried to keep the patch backport-friendly for 
>> 8u/11u/12u and refrained from using newer features like nmethod 
>> barriers introduced recently. The fix can be refactored later 
>> specifically for 13 as a followup change.
>>
>> Testing: clojure startup, tier1-5
>>
>> Thanks!
>>
>> Best regards,
>> Vladimir Ivanov
>>
>> [1] 
>> https://mail.openjdk.java.net/pipermail/hotspot-dev/2019-April/037760.html 
>>
>> [2] https://bugs.openjdk.java.net/browse/JDK-8219233
>> [3] https://bugs.openjdk.java.net/browse/JDK-8219492


More information about the hotspot-compiler-dev mailing list