[13] RFR (M): 8223213: Implement fast class initialization checks on x86-64

Wed May 22 16:58:21 UTC 2019

>>> Forgot to mention that your new test doesn't look like it will play 
>>> nicely when run with Graal enabled, so you may need to split up into 
>>> different @test sections and add "@requires !vm.graal.enabled" to 
>>> exclude graal.
>>
>> What kind of problem when running with Graal do you have in mind?
>>
>> I double-checked that the test passes with Graal enabled.
> 
> Your various @run lines are trying to execute in different compilation 
> modes: -Xint, C1 only, C2 only, plus various permutations. If you take 
> those command-lines and then add all the Graal flags to it then you are 
> no longer testing any of the things you wanted to test. At best you test 
> Graal 7 times - which is pointless. At worst you may find some 
> variations will timeout when Graal is applied.

The test explicitly checks class initialization invariants and specifies 
  execution modes to stress different scenarios.

It doesn't explicitly specify C2: it just runs with 
-XX:-TieredCompilation and Graal is used if it is set as top-tier 
compiler. For other modes (interpreter & C1), whether C2 or Graal are 
used is irrelevant.

But I take your point: -XX:-TieredCompilation is much slower with Graal 
and doesn't test the logic as intended (no relevant compiled code 
generated). So I'll add "@requires !vm.graal.enabled" as you suggest. 
(Will update the webrev in-place.)

> For that matter an Xcomp run will also negate your intentions.

With -Xcomp it works as intended except pre-warmup phase, but there's a 
dedicated -Xint run mode which covers that. So I don't think it's 
necessary to exclude the test when -Xcomp is specified.

Best regards,
Vladimir Ivanov

>>>> I'll be very happy to see this go in - though I do wish we had more 
>>>> platform coverage than just x86_64. Hopefully the other archs will 
>>>> jump on-board with this as well.
>>
>> Yes, fully agree with you. It should be pretty straightforward for 
>> maintainers to mirror x86-specific changes for their architectures.
>>
>>>> I was initially confused by the UseFastClassInitChecks flag as I 
>>>> couldn't really see why you would want to turn it off (other than 
>>>> perhaps during testing) but I see that it is really used (as you 
>>>> explained to Vladimir K.) to exclude the new code for platforms 
>>>> which have not implemented it. Though I'm still not sure that we 
>>>> shouldn't have something to detect it being turned on at runtime on 
>>>> platforms that don't support it (it will likely crash quickly but 
>>>> still ...). Keep wondering if there is a better way to handle this 
>>>> aspect of the change ...
>>
>> I deliberately made the flag develop, so it's possible to change it 
>> from command-line only in debug builds. I could introduce additional 
>> platform-specific validation, but it doesn't look worth the effort for 
>> such narrow case (and there are other develop flags which guard broken 
>> functionality).
>>
>>>> I can't comment on the actual interpreter and compiler changes - sorry.
>>
>> No problem, I'll wait for more reviews from Runtime team.
>>
>>>> This will need re-basing now that JDK-8219974 has been backed out.
>>
>> Done.
>>
>> Best regards,
>> Vladimir Ivanov
>>
>>>> On 2/05/2019 9:17 am, Vladimir Ivanov wrote:
>>>>> http://cr.openjdk.java.net/~vlivanov/8223213/webrev.00/
>>>>> https://bugs.openjdk.java.net/browse/JDK-8223213
>>>>>
>>>>> (It's a followup RFR on a earlier RFC [1].)
>>>>>
>>>>> Recent changes severely affected how static initializers are 
>>>>> executed and for long-running initializers it manifested as a 
>>>>> severe slowdown.
>>>>> As an example, it led to a 3x slowdown on some Clojure applications
>>>>> (JDK-8219233 [2]). The root cause is that until a class is fully 
>>>>> initialized, every invocation of static method on it goes through 
>>>>> method resolution.
>>>>>
>>>>> Proposed fix introduces fast class initialization barriers for C1, 
>>>>> C2, and template interpreter on x86-64. I did some experiments with 
>>>>> cross-platform approaches, but haven't got satisfactory results.
>>>>>
>>>>> On other platforms, behavior stays (mostly) intact. (I had to 
>>>>> revert some changes introduced by JDK-8219492 [3], since the 
>>>>> assumptions they rely on about accesses inside a class don't hold 
>>>>> in all cases.)
>>>>>
>>>>> The barrier is as simple as:
>>>>>     if (holder->is_not_initialized() &&
>>>>>         !holder->is_reentrant_initialization(current_thread)) {
>>>>>       // trigger call site re-resolution and block there
>>>>>     }
>>>>>
>>>>> There are 3 places where barriers are added:
>>>>>    * in template interpreter for invokestatic bytecode;
>>>>>    * at nmethod verified entry point (for normal compilations);
>>>>>    * c2i adapters;
>>>>>
>>>>> For template interperter, there's additional check added into 
>>>>> TemplateTable::resolve_cache_and_index which calls into 
>>>>> InterpreterRuntime::resolve_from_cache when fast path checks fail.
>>>>>
>>>>> In case of nmethods, the barrier is put before frame construction, 
>>>>> so existing compiler runtime routines can be reused 
>>>>> (SharedRuntime::get_handle_wrong_method_stub()).
>>>>>
>>>>> Also, C2 has a guard on entry (Parse::clinit_deopt()) which 
>>>>> triggers nmethod recompilation once the class is fully initialized.
>>>>>
>>>>> OSR compilations don't need a barrier.
>>>>>
>>>>> Correspondence between barriers and transitions they cover:
>>>>>    (1) from interpreter (barrier on caller side)
>>>>>         * all transitions: interpreter, compiled (i2c), native, 
>>>>> aot, ...
>>>>>
>>>>>    (2) from compiled (barrier on callee side)
>>>>>         to compiled, to native (barrier in native wrapper on entry)
>>>>>
>>>>>    (3) c2i bypasses both barriers (interpreter and compiled) and 
>>>>> requires a dedicated barrier in c2i
>>>>>
>>>>>    (4) to Graal/AOT code:
>>>>>          from interpreter: covered by interpreter barrier
>>>>>          from compiled: call site patching is disabled, leading to 
>>>>> repeated call site resolution until method holder is fully 
>>>>> initialized (original behavior).
>>>>>
>>>>> Performance experiments with clojure [2] demonstrated that the fix 
>>>>> almost completely recuperates the regression:
>>>>>
>>>>>    (1) always reresolve (w/o the fix):    ~12,0s ( 1x)
>>>>>    (2) C1/C2 barriers only:                ~3,8s (~3x)
>>>>>    (3) int/C1/C2 barriers:                 ~3,2s (-20%)
>>>>> --------
>>>>>    (4) barriers disabled for invokestatic  ~3,2s
>>>>>
>>>>> I deliberately tried to keep the patch backport-friendly for 
>>>>> 8u/11u/12u and refrained from using newer features like nmethod 
>>>>> barriers introduced recently. The fix can be refactored later 
>>>>> specifically for 13 as a followup change.
>>>>>
>>>>> Testing: clojure startup, tier1-5
>>>>>
>>>>> Thanks!
>>>>>
>>>>> Best regards,
>>>>> Vladimir Ivanov
>>>>>
>>>>> [1] 
>>>>> https://mail.openjdk.java.net/pipermail/hotspot-dev/2019-April/037760.html 
>>>>>
>>>>> [2] https://bugs.openjdk.java.net/browse/JDK-8219233
>>>>> [3] https://bugs.openjdk.java.net/browse/JDK-8219492