RFR: 8240556: Abort concurrent mark after effective eager reclamation of humongous objects [v3]

Thu Sep 24 07:41:10 UTC 2020

On Wed, 23 Sep 2020 14:50:04 GMT, Thomas Schatzl <tschatzl at openjdk.org> wrote:

>> Hi all,
>> 
>>   please review this change that implements concurrent mark abort as proposed by Liang Mao from Alibaba.
>> 
>> Basically, if at the end of the concurrent start pause we notice that we went below the IHOP threshold (and it has been
>> a concurrent start pause caused by humongous allocation), instead of scheduling a mark operation, we schedule a
>> "concurrent undo" operation that undoes the changes.  Regarding the removal of the
>> test/hotspot/jtreg/gc/stress/jfr/TestStressBigAllocationGCEventsWithG1.java test: it only ever accidentally worked in
>> G1. G1 never sent the AllocationRequiringGC events for GCs caused by going over the IHOP threshold for humongous
>> allocations that the test actually expects.  That test previously only worked because G1 could not reclaim the
>> humongous objects fast enough so crossing the IHOP threshold causes a full concurrent mark. Allocations during that
>> concurrent mark do not cause a GC that can reclaim these objects, so ultimately some young GC that sends the
>> AllocationRequiringGC event will be sent.  With concurrent undo this is not guaranteed any more, i.e. only in
>> environments where concurrent undo is slow (and we'll improve it soon) this test works.  The test is too timing
>> dependent, and checking for the wrong thing in this case, so removing it.  Testing: tbd.
>
> Thomas Schatzl has updated the pull request with a new target base due to a merge or a rebase. The pull request now
> contains five commits:
>  - sjohanss comments #2
>  - reviews sjohanns and ayang; merged Command and Status plus cleanup
>  - Remove test that only accidentally worked: G1 never sent the AllocationRequiringGC
>    event for GCs caused by going over the IHOP threshold for humongous allocations.
>    
>    That test previously only worked because G1 could not reclaim the humongous objects
>    fast enough so crossing the IHOP threshold causes a full concurrent mark. Allocations
>    during that concurrent mark do not cause a GC that can reclaim these objects, so
>    ultimately some young GC that sends the AllocationRequiringGC event will be sent.
>    
>    With concurrent undo this is not guaranteed any more, i.e. only in environments where
>    concurrent undo is slow (and we'll improve it soon) this test works.
>    
>    The test is too timing dependent, so removing it.
>  - Fixed test
>  - Initial import

Thanks for doing this additional refactoring Thomas.

I think this is good now, just one more small suggestion.

src/hotspot/share/gc/g1/g1Policy.cpp line 1262:

> 1260:         _concurrent_start_to_mixed.record_concurrent_start_end(end);
> 1261:       }
> 1262:       break;

I would add the undo-case after to avoid having to check for it in the if-statement. Like this:
Suggestion:

    case ConcurrentStartMarkGC:
      // Do not track time-to-mixed time for periodic collections as they are likely
      // to be not representative to regular operation as the mutators are idle at
      // that time.
      if ((_g1h->gc_cause() != GCCause::_g1_periodic_collection) {
        _concurrent_start_to_mixed.record_concurrent_start_end(end);
      }
    case ConcurrentStartUndoGC:
      // Do not track undone concurrent cycles.
      break;

If you feel like it you could add an explicit break for the start-case and assert that the start-to-mixed-tracking is
not active in the undo-case. But this would require a `is_active()` getter for the tracker.

-------------

PR: https://git.openjdk.java.net/jdk/pull/177