[13] RFR (M): 8223213: Implement fast class initialization checks on x86-64

Vladimir Kozlov vladimir.kozlov at oracle.com
Thu May 2 02:34:39 UTC 2019


On 5/1/19 7:13 PM, Vladimir Ivanov wrote:
> Thanks for the feedback, Vladimir!
> 
>> Why you skip patching code compiled by Graal and AOT?
> 
> It happens only for classes being initialized and effectively preserve current behavior (re-resolution until class is 
> fully initialized).
> 
> The motivation is the following:
> 
>    * Graal needs to put class init barriers in nmethods at verified entry
> point in the same way C1/C2 does with this patch;
> 
>    * regarding AOTed code (I haven't done extensive exploration, but based on private discussions), I believe it needs 
> additional barriers at method entry as well.

When Graal will add barriers AOT code will get them automatically.

> 
> Once proper support lands in Graal or AOT, the patching can be re-enabled.

Got it.

> 
>> The flag UseFastClassInitChecks could be diagnostic or even product. The feature is not for debugging.
> The flag is used to signal that platform-specific support is available. Unless there's a use case which benefits from 
> ability to turning it off (disable new barriers and fallback to re-resolution) from command line, I don't see much value 
> in turning the flag into diagnostic/product one.

Okay.

Thanks,
Vladimir

> 
> Best regards,
> Vladimir Ivanov
> 
>> On 5/1/19 4:17 PM, Vladimir Ivanov wrote:
>>> http://cr.openjdk.java.net/~vlivanov/8223213/webrev.00/
>>> https://bugs.openjdk.java.net/browse/JDK-8223213
>>>
>>> (It's a followup RFR on a earlier RFC [1].)
>>>
>>> Recent changes severely affected how static initializers are executed and for long-running initializers it manifested 
>>> as a severe slowdown.
>>> As an example, it led to a 3x slowdown on some Clojure applications
>>> (JDK-8219233 [2]). The root cause is that until a class is fully initialized, every invocation of static method on it 
>>> goes through method resolution.
>>>
>>> Proposed fix introduces fast class initialization barriers for C1, C2, and template interpreter on x86-64. I did some 
>>> experiments with cross-platform approaches, but haven't got satisfactory results.
>>>
>>> On other platforms, behavior stays (mostly) intact. (I had to revert some changes introduced by JDK-8219492 [3], 
>>> since the assumptions they rely on about accesses inside a class don't hold in all cases.)
>>>
>>> The barrier is as simple as:
>>>     if (holder->is_not_initialized() &&
>>>         !holder->is_reentrant_initialization(current_thread)) {
>>>       // trigger call site re-resolution and block there
>>>     }
>>>
>>> There are 3 places where barriers are added:
>>>    * in template interpreter for invokestatic bytecode;
>>>    * at nmethod verified entry point (for normal compilations);
>>>    * c2i adapters;
>>>
>>> For template interperter, there's additional check added into TemplateTable::resolve_cache_and_index which calls into 
>>> InterpreterRuntime::resolve_from_cache when fast path checks fail.
>>>
>>> In case of nmethods, the barrier is put before frame construction, so existing compiler runtime routines can be 
>>> reused (SharedRuntime::get_handle_wrong_method_stub()).
>>>
>>> Also, C2 has a guard on entry (Parse::clinit_deopt()) which triggers nmethod recompilation once the class is fully 
>>> initialized.
>>>
>>> OSR compilations don't need a barrier.
>>>
>>> Correspondence between barriers and transitions they cover:
>>>    (1) from interpreter (barrier on caller side)
>>>         * all transitions: interpreter, compiled (i2c), native, aot, ...
>>>
>>>    (2) from compiled (barrier on callee side)
>>>         to compiled, to native (barrier in native wrapper on entry)
>>>
>>>    (3) c2i bypasses both barriers (interpreter and compiled) and requires a dedicated barrier in c2i
>>>
>>>    (4) to Graal/AOT code:
>>>          from interpreter: covered by interpreter barrier
>>>          from compiled: call site patching is disabled, leading to repeated call site resolution until method holder 
>>> is fully initialized (original behavior).
>>>
>>> Performance experiments with clojure [2] demonstrated that the fix almost completely recuperates the regression:
>>>
>>>    (1) always reresolve (w/o the fix):    ~12,0s ( 1x)
>>>    (2) C1/C2 barriers only:                ~3,8s (~3x)
>>>    (3) int/C1/C2 barriers:                 ~3,2s (-20%)
>>> --------
>>>    (4) barriers disabled for invokestatic  ~3,2s
>>>
>>> I deliberately tried to keep the patch backport-friendly for 8u/11u/12u and refrained from using newer features like 
>>> nmethod barriers introduced recently. The fix can be refactored later specifically for 13 as a followup change.
>>>
>>> Testing: clojure startup, tier1-5
>>>
>>> Thanks!
>>>
>>> Best regards,
>>> Vladimir Ivanov
>>>
>>> [1] https://mail.openjdk.java.net/pipermail/hotspot-dev/2019-April/037760.html
>>> [2] https://bugs.openjdk.java.net/browse/JDK-8219233
>>> [3] https://bugs.openjdk.java.net/browse/JDK-8219492


More information about the hotspot-runtime-dev mailing list