RFR: 8256641: CDS VM operations do not lock the heap
    Thomas Schatzl 
    tschatzl at openjdk.java.net
       
    Wed Dec  9 12:31:34 UTC 2020
    
    
  
On Wed, 9 Dec 2020 11:58:46 GMT, Kim Barrett <kbarrett at openjdk.org> wrote:
>> Hi all,
>> 
>>   can I get reviews for this change that adds missing synchronization of CDS related VM operations with other heap operations?
>> 
>> `VM_PopulateDumpSharedSpace`, `VM_PopulateDynamicDumpSharedSpace` and `VM_Verify` are used during CDS operation, one for creating the CDS archive (eventually doing a GC), one for mapping in the CDS archive into the heap, and the last one for verification.
>> 
>> (Fwiw, imho the first two are awfully close and should be renamed to be better distinguishable, but that's another matter)
>> 
>> They all in one way or the other need to synchronize with garbage collection as they may either do a GC or just do verification, as actual (STW-)gc returns an uninitialized block of memory that is not parseable; and before that block of memory can be initialized, another VM operation like one of the mentioned could be started otherwise seeing that uninitialized memory and crashing.
>> 
>> The existing mechanism to prevent this kind of interference is taking the `Heap_lock`, so the suggested solution is based on having all these VM operations descend from a new `VM_GC_Sync_Operation` `VM_Operation` which does that (and only that), split out from `VM_GC_Operation`.
>> 
>> There some points I would like to bring up in advance in this change that may be contentious:
>> - each VM Operation could handle `Heap_lock` by itself, which I considered to be too error-prone.
>> - the need for `VM_Verify` to coordinate with garbage collections is new and has been introduced with [JDK-8253081](https://bugs.openjdk.java.net/browse/JDK-8253081) as since then a Java thread might execute it - that's why this hasn't been a problem before. That could be undone (removed), but I kind of believe that with more expected changes to the CDS mechanism in the future the additional full-heap verification after loading the archive is worth the additional effort.
>> One (implementation) drawback is that since ZGC also uses `VM_Verify`, that operation now gets the `Heap_lock` too, and is kind of also using some part of the "set of operations related to GC" in general but did not so before, keeping almost completely separate. Testing did not show an issue, and I tried to look at the code carefully to see whether there could be issues with no result. (I.e. I couldn't find an issue). Obviously I'd like to ask you to look over this again.
>> - so this change adds a new VM Operation class called `VM_GC_Sync_Operation` that splits off the handling of `Heap_lock` (i.e. the actual synchronization` from `VM_GC_Operation`. The reason is that I do not think the logic for the gc VM operation that prevents multiple back-to-back GC operations is a good fit for any of the `VM_Populate*` or even `VM_Verify` operations.
>> 
>> Testing: tier1-5; test case attached to the CR; other known reproducers (runtime/valhalla/inlinetypes/InlineOops.java in the Valhalla repo)
>
> src/hotspot/share/gc/shared/gcVMOperations.cpp line 64:
> 
>> 62: 
>> 63: void VM_GC_Sync_Operation::doit_epilogue() {
>> 64:   if (Universe::has_reference_pending_list()) {
> 
> Why is the pending list handling moved here, rather than remaining in VM_GC_Operation::doit_epilogue?  This doesn't have anything to do with syncing between operations, and seems odd for VM_Verify (for example) to do.
Also answering the next question: these two items (i.e. including the `prologue_succeeded` stuff) have mostly been kept there to allow simple reuse in `VM_GC_Operation`. I'll remove those and (maybe) just break the inheritance chain.
-------------
PR: https://git.openjdk.java.net/jdk/pull/1661
    
    
More information about the hotspot-gc-dev
mailing list