[13] RFR(L) 8220623: [JVMCI] Update JVMCI to support JVMCI based Compiler compiled into shared library

Vladimir Kozlov vladimir.kozlov at oracle.com
Fri Apr 5 15:22:26 UTC 2019


Thank you, Stefan

Vladimir

On 4/4/19 1:10 AM, Stefan Karlsson wrote:
> GC delta looks good.
> 
> Thanks,
> StefanK
> 
> On 2019-04-04 09:22, Vladimir Kozlov wrote:
>> New delta:
>> http://cr.openjdk.java.net/~kvn/8220623/webrev_delta.06/
>>
>> Full:
>> http://cr.openjdk.java.net/~kvn/8220623/webrev.06/
>>
>> New changes are based on Kim and Stefan suggestions:
>>
>> - Moved JVMCI::oops_do() from JNIHandles to places where it should be called.
>> - Moved JVMCI cleanup task to the beginning of ParallelCleaningTask::work().
>> - Used JVMCI_ONLY macro with COMMA.
>> - Disable JVMCI build on SPARC. We don't use it - neither Graal or AOT are built on SPARC. Disabling also helps to 
>> find missing JVMCI guards.
>>
>> I ran hs-tier1-3 testing - it passed (hs-tier3 includes graal testing).
>> I started hs-tier4..8-graal testing.
>> I will do performance testing next.
>>
>> Thanks,
>> Vladimir
>>
>> On 4/3/19 9:54 AM, Vladimir Kozlov wrote:
>>> On 4/2/19 11:35 PM, Stefan Karlsson wrote:
>>>> On 2019-04-02 22:41, Vladimir Kozlov wrote:
>>>>> I ran Kitchensink with G1 and -Xmx8g. I observed that Remark pause times are not consistent even without Graal.
>>>>> To see effect I added time spent in JVMCI::do_unloading() to GC log (see below [3]). The result is < 1ms - it is 
>>>>> less than 1% of a pause time.
>>>>
>>>> Kitchensink isn't really a benchmark, but a stress test. I sent you a private mail how to run these changes through 
>>>> our internal performance test setup.
>>>
>>> Okay, I will run performance tests there too.
>>>
>>>>
>>>>>
>>>>> It will have even less effect since I moved JVMCI::do_unloading() from serial path to parallel worker thread as 
>>>>> Stefan suggested.
>>>>>
>>>>> Stefan, are you satisfied with these changes now?
>>>>
>>>> Yes, the clean-ups look good. Thanks for cleaning this up.
>>>>
>>>> Kim had some extra comments about a few more places where JVMCI_ONLY could be used.
>>>>
>>>> I also agree with him that JVMCI::oops_do should not be placed in JNIHandles::oops_do. I think you should put it 
>>>> where you put the AOTLoader::oops_do calls.
>>>
>>> Okay.
>>>
>>> Thanks,
>>> Vladimir
>>>
>>>>
>>>> Thanks,
>>>> StefanK
>>>>
>>>>
>>>>>
>>>>> Here is latest delta update which includes previous [1] delta and
>>>>> - use CompilerThreadStackSize * 2 for libgraal instead of exact value,
>>>>> - removed HandleMark added for debugging (reverted changes in jvmtiImpl.cpp),
>>>>> - added recent jvmci-8 changes to fix registration of native methods in libgraal (jvmciCompilerToVM.cpp)
>>>>>
>>>>> http://cr.openjdk.java.net/~kvn/8220623/webrev_delta.05/
>>>>>
>>>>> Thanks,
>>>>> Vladimir
>>>>>
>>>>> [1] http://cr.openjdk.java.net/~kvn/8220623/webrev_delta.04/
>>>>> [2] Original webrev http://cr.openjdk.java.net/~kvn/8220623/webrev.03/
>>>>> [3] Pauses times from Kitchensink (0.0ms means there were no unloaded classes, 'NNN alive' shows how many metadata 
>>>>> references were processed):
>>>>>
>>>>> [1.083s][1554229160638ms][info ][gc,start     ] GC(2) Pause Remark
>>>>> [1.085s][1554229160639ms][info ][gc           ] GC(2) JVMCI::do_unloading(): 0 alive 0.000ms
>>>>> [1.099s][1554229160654ms][info ][gc           ] GC(2) Pause Remark 28M->28M(108M) 16.123ms
>>>>>
>>>>> [3.097s][1554229162651ms][info ][gc,start     ] GC(12) Pause Remark
>>>>> [3.114s][1554229162668ms][info ][gc           ] GC(12) JVMCI::do_unloading(): 3471 alive 0.164ms
>>>>> [3.148s][1554229162702ms][info ][gc           ] GC(12) Pause Remark 215M->213M(720M) 51.103ms
>>>>>
>>>>> [455.111s][1554229614666ms][info ][gc,phases,start] GC(1095) Phase 1: Mark live objects
>>>>> [455.455s][1554229615010ms][info ][gc             ] GC(1095) JVMCI::do_unloading(): 4048 alive 0.821ms
>>>>> [455.456s][1554229615010ms][info ][gc,phases      ] GC(1095) Phase 1: Mark live objects 344.107ms
>>>>>
>>>>> [848.932s][1554230008486ms][info ][gc,phases,start] GC(1860) Phase 1: Mark live objects
>>>>> [849.248s][1554230008803ms][info ][gc             ] GC(1860) JVMCI::do_unloading(): 3266 alive 0.470ms
>>>>> [849.249s][1554230008803ms][info ][gc,phases      ] GC(1860) Phase 1: Mark live objects 316.527ms
>>>>>
>>>>> [1163.778s][1554230323332ms][info ][gc,start       ] GC(2627) Pause Remark
>>>>> [1163.932s][1554230323486ms][info ][gc             ] GC(2627) JVMCI::do_unloading(): 3474 alive 0.642ms
>>>>> [1163.941s][1554230323496ms][info ][gc             ] GC(2627) Pause Remark 2502M->2486M(4248M) 163.296ms
>>>>>
>>>>> [1242.587s][1554230402141ms][info ][gc,phases,start] GC(2734) Phase 1: Mark live objects
>>>>> [1242.899s][1554230402453ms][info ][gc             ] GC(2734) JVMCI::do_unloading(): 3449 alive 0.570ms
>>>>> [1242.899s][1554230402453ms][info ][gc,phases      ] GC(2734) Phase 1: Mark live objects 311.719ms
>>>>>
>>>>> [1364.164s][1554230523718ms][info ][gc,phases,start] GC(3023) Phase 1: Mark live objects
>>>>> [1364.613s][1554230524167ms][info ][gc             ] GC(3023) JVMCI::do_unloading(): 3449 alive 0.000ms
>>>>> [1364.613s][1554230524167ms][info ][gc,phases      ] GC(3023) Phase 1: Mark live objects 448.495ms
>>>>>
>>>>> [1425.222s][1554230584776ms][info ][gc,phases,start] GC(3151) Phase 1: Mark live objects
>>>>> [1425.587s][1554230585142ms][info ][gc             ] GC(3151) JVMCI::do_unloading(): 3491 alive 0.882ms
>>>>> [1425.587s][1554230585142ms][info ][gc,phases      ] GC(3151) Phase 1: Mark live objects 365.403ms
>>>>>
>>>>> [1456.401s][1554230615955ms][info ][gc,phases,start] GC(3223) Phase 1: Mark live objects
>>>>> [1456.769s][1554230616324ms][info ][gc             ] GC(3223) JVMCI::do_unloading(): 3478 alive 0.616ms
>>>>> [1456.769s][1554230616324ms][info ][gc,phases      ] GC(3223) Phase 1: Mark live objects 368.643ms
>>>>>
>>>>> [1806.139s][1554230965694ms][info   ][gc,start       ] GC(4014) Pause Remark
>>>>> [1806.161s][1554230965716ms][info   ][gc             ] GC(4014) JVMCI::do_unloading(): 3478 alive 0.000ms
>>>>> [1806.163s][1554230965717ms][info   ][gc             ] GC(4014) Pause Remark 1305M->1177M(2772M) 23.190ms
>>>>>
>>>>>
>>>>>
>>>>> On 4/1/19 12:34 AM, Stefan Karlsson wrote:
>>>>>> On 2019-03-29 17:55, Vladimir Kozlov wrote:
>>>>>>> Stefan,
>>>>>>>
>>>>>>> Do you have a test (and flags) which can allow me to measure effect of this code on G1 remark pause?
>>>>>>
>>>>>>
>>>>>> -Xlog:gc prints the remark times:
>>>>>> [4,296s][info][gc       ] GC(89) Pause Remark 4M->4M(28M) 36,412ms
>>>>>>
>>>>>> StefanK
>>>>>>
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Vladimir
>>>>>>>
>>>>>>> On 3/29/19 12:36 AM, Stefan Karlsson wrote:
>>>>>>>> On 2019-03-29 03:07, Vladimir Kozlov wrote:
>>>>>>>>> Hi Stefan,
>>>>>>>>>
>>>>>>>>> I collected some data on MetadataHandleBlock.
>>>>>>>>>
>>>>>>>>> First, do_unloading() code is executed only when class_unloading_occurred is 'true' - it is rare case. It 
>>>>>>>>> should not affect normal G1 remark pause.
>>>>>>>>
>>>>>>>> It's only rare for applications that don't do dynamic class loading and unloading. The applications that do, 
>>>>>>>> will be affected.
>>>>>>>>
>>>>>>>>>
>>>>>>>>> Second, I run a test with -Xcomp. I got about 10,000 compilations by Graal and next data at the end of execution:
>>>>>>>>>
>>>>>>>>> max_blocks = 232
>>>>>>>>> max_handles_per_block = 32 (since handles array has 32 elements)
>>>>>>>>> max_total_alive_values = 4631
>>>>>>>>
>>>>>>>> OK. Thanks for the info.
>>>>>>>>
>>>>>>>> StefanK
>>>>>>>>
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Vladimir
>>>>>>>>>
>>>>>>>>> On 3/28/19 2:44 PM, Vladimir Kozlov wrote:
>>>>>>>>>> Thank you, Stefan
>>>>>>>>>>
>>>>>>>>>> On 3/28/19 12:54 PM, Stefan Karlsson wrote:
>>>>>>>>>>> Hi Vladimir,
>>>>>>>>>>>
>>>>>>>>>>> I started to check the GC code.
>>>>>>>>>>>
>>>>>>>>>>> ========================================================================
>>>>>>>>>>> I see that you've added guarded includes in the middle of the include list:
>>>>>>>>>>>    #include "gc/shared/strongRootsScope.hpp"
>>>>>>>>>>>    #include "gc/shared/weakProcessor.hpp"
>>>>>>>>>>> + #if INCLUDE_JVMCI
>>>>>>>>>>> + #include "jvmci/jvmci.hpp"
>>>>>>>>>>> + #endif
>>>>>>>>>>>    #include "oops/instanceRefKlass.hpp"
>>>>>>>>>>>    #include "oops/oop.inline.hpp"
>>>>>>>>>>>
>>>>>>>>>>> The style we use is to put these conditional includes at the end of the include lists.
>>>>>>>>>>
>>>>>>>>>> okay
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> ========================================================================
>>>>>>>>>>> Could you also change the following:
>>>>>>>>>>>
>>>>>>>>>>> + #if INCLUDE_JVMCI
>>>>>>>>>>> +     // Clean JVMCI metadata handles.
>>>>>>>>>>> +     JVMCI::do_unloading(is_alive_closure(), purged_class);
>>>>>>>>>>> + #endif
>>>>>>>>>>>
>>>>>>>>>>> to:
>>>>>>>>>>> +     // Clean JVMCI metadata handles.
>>>>>>>>>>> +     JVMCI_ONLY(JVMCI::do_unloading(is_alive_closure(), purged_class);)
>>>>>>>>>>>
>>>>>>>>>>> to get rid of some of the line noise in the GC files.
>>>>>>>>>>
>>>>>>>>>> okay
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> ========================================================================
>>>>>>>>>>> In the future we will need version of JVMCI::do_unloading that supports concurrent cleaning for ZGC.
>>>>>>>>>>
>>>>>>>>>> Yes, we need to support concurrent cleaning in a future.
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> ========================================================================
>>>>>>>>>>> What's the performance impact for G1 remark pause with this serial walk over the MetadataHandleBlock?
>>>>>>>>>>>
>>>>>>>>>>> 3275 void G1CollectedHeap::complete_cleaning(BoolObjectClosure* is_alive,
>>>>>>>>>>> 3276                                         bool class_unloading_occurred) {
>>>>>>>>>>> 3277   uint num_workers = workers()->active_workers();
>>>>>>>>>>> 3278   ParallelCleaningTask unlink_task(is_alive, num_workers, class_unloading_occurred, false);
>>>>>>>>>>> 3279   workers()->run_task(&unlink_task);
>>>>>>>>>>> 3280 #if INCLUDE_JVMCI
>>>>>>>>>>> 3281   // No parallel processing of JVMCI metadata handles for now.
>>>>>>>>>>> 3282   JVMCI::do_unloading(is_alive, class_unloading_occurred);
>>>>>>>>>>> 3283 #endif
>>>>>>>>>>> 3284 }
>>>>>>>>>>
>>>>>>>>>> There should not be impact if Graal is not used. Only cost of call (which most likely is inlined in product 
>>>>>>>>>> VM) and check:
>>>>>>>>>>
>>>>>>>>>> http://hg.openjdk.java.net/metropolis/dev/file/530fc1427d02/src/hotspot/share/jvmci/jvmciRuntime.cpp#l1237
>>>>>>>>>>
>>>>>>>>>> If Graal is used it should not have big impact since these metadata has regular pattern (32 handles per array 
>>>>>>>>>> and array per MetadataHandleBlock block which are linked in list) and not large.
>>>>>>>>>> If there will be noticeable impact - we will work on it as you suggested by using ParallelCleaningTask.
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> ========================================================================
>>>>>>>>>>> Did you consider adding it as a task for one of the worker threads to execute in ParallelCleaningTask?
>>>>>>>>>>>
>>>>>>>>>>> See how other tasks are claimed by one worker:
>>>>>>>>>>> void KlassCleaningTask::work() {
>>>>>>>>>>>    ResourceMark rm;
>>>>>>>>>>>
>>>>>>>>>>>    // One worker will clean the subklass/sibling klass tree.
>>>>>>>>>>>    if (claim_clean_klass_tree_task()) {
>>>>>>>>>>>      Klass::clean_subklass_tree();
>>>>>>>>>>>    }
>>>>>>>>>>
>>>>>>>>>> These changes were ported from JDK8u based changes in graal-jvmci-8 and there are no ParallelCleaningTask in 
>>>>>>>>>> JDK8.
>>>>>>>>>>
>>>>>>>>>> Your suggestion is interesting and I agree that we should investigate it.
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> ========================================================================
>>>>>>>>>>> In MetadataHandleBlock::do_unloading:
>>>>>>>>>>>
>>>>>>>>>>> +        if (klass->class_loader_data()->is_unloading()) {
>>>>>>>>>>> +          // This needs to be marked so that it's no longer scanned
>>>>>>>>>>> +          // but can't be put on the free list yet. The
>>>>>>>>>>> +          // ReferenceCleaner will set this to NULL and
>>>>>>>>>>> +          // put it on the free list.
>>>>>>>>>>>
>>>>>>>>>>> I couldn't find the ReferenceCleaner in the patch or in the source. Where can I find this code?
>>>>>>>>>>
>>>>>>>>>> I think it is typo (I will fix it) - it references new HandleCleaner class:
>>>>>>>>>>
>>>>>>>>>> http://cr.openjdk.java.net/~kvn/8220623/webrev.03/src/jdk.internal.vm.ci/share/classes/jdk.vm.ci.hotspot/src/jdk/vm/ci/hotspot/HandleCleaner.java.html 
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> Vladimir
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>> StefanK
>>>>>>>>>>>
>>>>>>>>>>> On 2019-03-28 20:15, Vladimir Kozlov wrote:
>>>>>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8220623
>>>>>>>>>>>> http://cr.openjdk.java.net/~kvn/8220623/webrev.03/
>>>>>>>>>>>>
>>>>>>>>>>>> Update JVMCI to support pre-compiled as shared library Graal.
>>>>>>>>>>>> Using aoted Graal can offers benefits including:
>>>>>>>>>>>>  - fast startup
>>>>>>>>>>>>  - compile time similar to native JIt compilers (C2)
>>>>>>>>>>>>  - memory usage disjoint from the application Java heap
>>>>>>>>>>>>  - no profile pollution of JDK code used by the application
>>>>>>>>>>>>
>>>>>>>>>>>> This is JDK13 port of JVMCI changes done in graal-jvmci-8 [1] up to date.
>>>>>>>>>>>> Changes were collected in Metropolis repo [2] and tested there.
>>>>>>>>>>>>
>>>>>>>>>>>> Changes we reviewed by Oracle Labs (authors of JVMCI and Graal) and our compiler group.
>>>>>>>>>>>> Changes in shared code are guarded by #if INCLUDE_JVMCI and JVMCI flags.
>>>>>>>>>>>>
>>>>>>>>>>>> I ran tier1-tier8 (which includes HotSpot and JDK tests) and it was clean. In this set Graal was tested only 
>>>>>>>>>>>> in tier3.
>>>>>>>>>>>>
>>>>>>>>>>>> And I ran all hs-tier3-graal .. hs-tier8-graal Graal tests available in our system. Several issue were found 
>>>>>>>>>>>> which were present before these changes.
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>> Vladimir
>>>>>>>>>>>>
>>>>>>>>>>>> [1] https://github.com/graalvm/graal-jvmci-8/commit/49ff2045fb603e35516a3a427d8023c00e1607af
>>>>>>>>>>>> [2] http://hg.openjdk.java.net/metropolis/dev/
>>>>>>>>>>>


More information about the hotspot-dev mailing list