RFR: 8240556: Abort concurrent mark after effective eager reclamation of humongous objects [v2]

Tue Sep 22 08:48:41 UTC 2020

On Mon, 21 Sep 2020 18:15:08 GMT, Thomas Schatzl <tschatzl at openjdk.org> wrote:

>> Hi all,
>> 
>>   please review this change that implements concurrent mark abort as proposed by Liang Mao from Alibaba.
>> 
>> Basically, if at the end of the concurrent start pause we notice that we went below the IHOP threshold (and it has been
>> a concurrent start pause caused by humongous allocation), instead of scheduling a mark operation, we schedule a
>> "concurrent undo" operation that undoes the changes.  Regarding the removal of the
>> test/hotspot/jtreg/gc/stress/jfr/TestStressBigAllocationGCEventsWithG1.java test: it only ever accidentally worked in
>> G1. G1 never sent the AllocationRequiringGC events for GCs caused by going over the IHOP threshold for humongous
>> allocations that the test actually expects.  That test previously only worked because G1 could not reclaim the
>> humongous objects fast enough so crossing the IHOP threshold causes a full concurrent mark. Allocations during that
>> concurrent mark do not cause a GC that can reclaim these objects, so ultimately some young GC that sends the
>> AllocationRequiringGC event will be sent.  With concurrent undo this is not guaranteed any more, i.e. only in
>> environments where concurrent undo is slow (and we'll improve it soon) this test works.  The test is too timing
>> dependent, and checking for the wrong thing in this case, so removing it.  Testing: tbd.
>
> Thomas Schatzl has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev
> excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since
> the last revision:
>  - reviews sjohanns and ayang; merged Command and Status plus cleanup
>  - Remove test that only accidentally worked: G1 never sent the AllocationRequiringGC
>    event for GCs caused by going over the IHOP threshold for humongous allocations.
>    
>    That test previously only worked because G1 could not reclaim the humongous objects
>    fast enough so crossing the IHOP threshold causes a full concurrent mark. Allocations
>    during that concurrent mark do not cause a GC that can reclaim these objects, so
>    ultimately some young GC that sends the AllocationRequiringGC event will be sent.
>    
>    With concurrent undo this is not guaranteed any more, i.e. only in environments where
>    concurrent undo is slow (and we'll improve it soon) this test works.
>    
>    The test is too timing dependent, so removing it.
>  - Fixed test
>  - Initial import

>   I dislike putting this pause-related state into global state. Unless you or somebody else insists on this I would like
>   to keep the code.
> 
> Filed JDK-8253343 which among other things should clean this up and make it possible to easily manage pause-only state.

I get your argument about not putting this into the collector-state and I'm fine with leaving it like this for now. I
think this update really helped and I have two more comments on how to reduce the "state passing" even more. Let me
know what you think.

src/hotspot/share/gc/g1/g1ConcurrentMark.cpp line 707:

> 705:     root_regions()->cancel_scan();
> 706:     return;
> 707:   }

I still think this should be broken out from this function. Either doing it in
`G1ConcurrentMarkThread::concurrent_undo_cycle_do()` or having a separate function that is called right before
`_cm_thread->start_undo_mark()` in `CollectedHeap::start_concurrent_cycle()`.

src/hotspot/share/gc/g1/g1Policy.cpp line 790:

> 788:     if (!concurrent_operation_is_full_mark) {
> 789:       abort_time_to_mixed_tracking();
> 790:     }

More or less same idea here. What do you think about moving this to a separate function, that is either called from
`start_concurrent_cycle()` or `concurrent_undo_cycle_do()`?

-------------

PR: https://git.openjdk.java.net/jdk/pull/177