RFR: 8240556: Abort concurrent mark after effective eager reclamation of humongous objects

Mon Sep 21 18:21:02 UTC 2020

Hi,

On 18.09.20 10:43, Stefan Johansson wrote:
> On Tue, 15 Sep 2020 12:40:46 GMT, Thomas Schatzl <tschatzl at openjdk.org> wrote:
> 
>> Hi all,
>>
>>    please review this change that implements concurrent mark abort as proposed by Liang Mao from Alibaba.
>>
>> Basically, if at the end of the concurrent start pause we notice that we went below the IHOP threshold (and it has been
>> a concurrent start pause caused by humongous allocation), instead of scheduling a mark operation, we schedule a
>> "concurrent undo" operation that undoes the changes.  Regarding the removal of the
>> test/hotspot/jtreg/gc/stress/jfr/TestStressBigAllocationGCEventsWithG1.java test: it only ever accidentally worked in
>> G1. G1 never sent the AllocationRequiringGC events for GCs caused by going over the IHOP threshold for humongous
>> allocations that the test actually expects.  That test previously only worked because G1 could not reclaim the
>> humongous objects fast enough so crossing the IHOP threshold causes a full concurrent mark. Allocations during that
>> concurrent mark do not cause a GC that can reclaim these objects, so ultimately some young GC that sends the
>> AllocationRequiringGC event will be sent.  With concurrent undo this is not guaranteed any more, i.e. only in
>> environments where concurrent undo is slow (and we'll improve it soon) this test works.  The test is too timing
>> dependent, and checking for the wrong thing in this case, so removing it.  Testing: tbd.
> 
> Thanks for taking this on Thomas. The change looks good but I have a few comments on the structure.
> 
> src/hotspot/share/gc/g1/g1CollectedHeap.cpp line 3064:
> 
>> 3062:         concurrent_operation_is_mark_cycle =
>> 3063:           should_start_concurrent_mark_operation &&
>> 3064:           ((gc_cause() != GCCause::_g1_humongous_allocation) || policy()->need_to_start_conc_mark("Revise"));
> 
> What do you think about setting a state in `G1CollectorState` instead of passing this around? Something like
> `_cancel_concurrent_mark`.
> 
> src/hotspot/share/gc/g1/g1CollectedHeap.cpp line 3074:
> 
>> 3072:         double sample_end_time_sec = os::elapsedTime();
>> 3073:         double pause_time_ms = (sample_end_time_sec - sample_start_time_sec) * MILLIUNITS;
>> 3074:         policy()->record_collection_pause_end(pause_time_ms, concurrent_operation_is_mark_cycle);
> 
> Then we can just check the collector state in `record_collection_pause_end` instead of passing it in here.
> 

   I dislike putting this pause-related state into global state. Unless 
you or somebody else insists on this I would like to keep the code.

Filed JDK-8253343 which among other things should clean this up and make 
it possible to easily manage pause-only state.

> src/hotspot/share/gc/g1/g1CollectedHeap.cpp line 3121: >
>> 3119:     // Note: of course, the actual marking work will not start until the safepoint
>> 3120:     // itself is released in SuspendibleThreadSet::desynchronize().
>> 3121:     start_concurrent_cycle(concurrent_operation_is_mark_cycle /* full_mark */);
> 
> Instead of passing the state down here what do you think of something like:
> Suggestion:
> 
>      if (_collector_state->cancel_conc_start()) {
>        cancel_concurrent_cycle();
>      } else {
>        start_concurrent_cycle();
>      }
> 
> Where `cancel_concurrent_cycle()` would do:
>    root_regions()->cancel_scan();
>    MutexLocker x(CGC_lock, Mutex::_no_safepoint_check_flag);
>    if (!_cm_thread->in_progress()) {
>      _cm_thread->set_canceled();
>      CGC_lock->notify();
>    }
> 

I hope I improved that method a bit, but other than that kept it.

> src/hotspot/share/gc/g1/g1ConcurrentMark.cpp line 706:
> 
>> 704:     root_regions()->cancel_scan();
>> 705:     return;
>> 706:   }
> 
> No need for this if you agree with my above comments.
> 
> src/hotspot/share/gc/g1/g1ConcurrentMarkThread.cpp line 67:
> 
>> 65:   assert(_state == Idle, "cycle in progress");
>> 66:   _state = mark_cycle ? StartMark : StartUndo;
>> 67: }
> 
> I would prefer two different function, one for start and one for undo/cancel.
> 

Done.

> src/hotspot/share/gc/g1/g1ConcurrentMarkThread.cpp line 208:
> 
>> 206:     result = UndoCycle;
>> 207:   }
>> 208:
> 
> I would prefer if `wait_for_next_cycle()` continued returning bool and not see Terminate as a command. I'm also
> thinking that the command and _state is more or less the same thing, or am I missing something? If we want it could we
> instead have the command as a member that we set when we either start or undo/cancel the cycle?
> 

Done. Also merged Command and ServiceState, and removed a few things.

> src/hotspot/share/gc/g1/g1Policy.cpp line 907:
> 
>> 905:       abort_time_to_mixed_tracking();
>> 906:     }
>> 907:     collector_state()->set_mark_or_rebuild_in_progress(start_concurrent_mark_cycle);
> 
> As mentioned, I would prefer to check the `collector_state()` here rather than a passed in value.
> 
> src/hotspot/share/gc/g1/g1ConcurrentMarkThread.cpp line 179:
> 
>> 177:     }
>> 178:
>> 179:     concurrent_cycle_end(cmd == MarkCycle && !_cm->has_aborted());
> 
> I think it would be more clear if this was called with `false` for the undo-case and with `!_cm->has_aborted()` for the
> normal case. But if you want to avoid two calls, I'm fine with this.
> 
> -------------
> 
> Changes requested by sjohanss (Reviewer).
> 
> PR: https://git.openjdk.java.net/jdk/pull/177
> 

Also moved the setters into .inline.hpp files as Albert suggested.

Updated PR. Note that the bots have gone missing in this thread. :(

Webrevs are broken as well - already filed an issue. :(

Testing: tier1-5 passed with new changes.

Thanks,
   Thomas