RFR: 8240556: Abort concurrent mark after effective eager reclamation of humongous objects
Thomas Schatzl
thomas.schatzl at oracle.com
Mon Sep 21 18:21:02 UTC 2020
Hi,
On 18.09.20 10:43, Stefan Johansson wrote:
> On Tue, 15 Sep 2020 12:40:46 GMT, Thomas Schatzl <tschatzl at openjdk.org> wrote:
>
>> Hi all,
>>
>> please review this change that implements concurrent mark abort as proposed by Liang Mao from Alibaba.
>>
>> Basically, if at the end of the concurrent start pause we notice that we went below the IHOP threshold (and it has been
>> a concurrent start pause caused by humongous allocation), instead of scheduling a mark operation, we schedule a
>> "concurrent undo" operation that undoes the changes. Regarding the removal of the
>> test/hotspot/jtreg/gc/stress/jfr/TestStressBigAllocationGCEventsWithG1.java test: it only ever accidentally worked in
>> G1. G1 never sent the AllocationRequiringGC events for GCs caused by going over the IHOP threshold for humongous
>> allocations that the test actually expects. That test previously only worked because G1 could not reclaim the
>> humongous objects fast enough so crossing the IHOP threshold causes a full concurrent mark. Allocations during that
>> concurrent mark do not cause a GC that can reclaim these objects, so ultimately some young GC that sends the
>> AllocationRequiringGC event will be sent. With concurrent undo this is not guaranteed any more, i.e. only in
>> environments where concurrent undo is slow (and we'll improve it soon) this test works. The test is too timing
>> dependent, and checking for the wrong thing in this case, so removing it. Testing: tbd.
>
> Thanks for taking this on Thomas. The change looks good but I have a few comments on the structure.
>
> src/hotspot/share/gc/g1/g1CollectedHeap.cpp line 3064:
>
>> 3062: concurrent_operation_is_mark_cycle =
>> 3063: should_start_concurrent_mark_operation &&
>> 3064: ((gc_cause() != GCCause::_g1_humongous_allocation) || policy()->need_to_start_conc_mark("Revise"));
>
> What do you think about setting a state in `G1CollectorState` instead of passing this around? Something like
> `_cancel_concurrent_mark`.
>
> src/hotspot/share/gc/g1/g1CollectedHeap.cpp line 3074:
>
>> 3072: double sample_end_time_sec = os::elapsedTime();
>> 3073: double pause_time_ms = (sample_end_time_sec - sample_start_time_sec) * MILLIUNITS;
>> 3074: policy()->record_collection_pause_end(pause_time_ms, concurrent_operation_is_mark_cycle);
>
> Then we can just check the collector state in `record_collection_pause_end` instead of passing it in here.
>
I dislike putting this pause-related state into global state. Unless
you or somebody else insists on this I would like to keep the code.
Filed JDK-8253343 which among other things should clean this up and make
it possible to easily manage pause-only state.
> src/hotspot/share/gc/g1/g1CollectedHeap.cpp line 3121: >
>> 3119: // Note: of course, the actual marking work will not start until the safepoint
>> 3120: // itself is released in SuspendibleThreadSet::desynchronize().
>> 3121: start_concurrent_cycle(concurrent_operation_is_mark_cycle /* full_mark */);
>
> Instead of passing the state down here what do you think of something like:
> Suggestion:
>
> if (_collector_state->cancel_conc_start()) {
> cancel_concurrent_cycle();
> } else {
> start_concurrent_cycle();
> }
>
> Where `cancel_concurrent_cycle()` would do:
> root_regions()->cancel_scan();
> MutexLocker x(CGC_lock, Mutex::_no_safepoint_check_flag);
> if (!_cm_thread->in_progress()) {
> _cm_thread->set_canceled();
> CGC_lock->notify();
> }
>
I hope I improved that method a bit, but other than that kept it.
> src/hotspot/share/gc/g1/g1ConcurrentMark.cpp line 706:
>
>> 704: root_regions()->cancel_scan();
>> 705: return;
>> 706: }
>
> No need for this if you agree with my above comments.
>
> src/hotspot/share/gc/g1/g1ConcurrentMarkThread.cpp line 67:
>
>> 65: assert(_state == Idle, "cycle in progress");
>> 66: _state = mark_cycle ? StartMark : StartUndo;
>> 67: }
>
> I would prefer two different function, one for start and one for undo/cancel.
>
Done.
> src/hotspot/share/gc/g1/g1ConcurrentMarkThread.cpp line 208:
>
>> 206: result = UndoCycle;
>> 207: }
>> 208:
>
> I would prefer if `wait_for_next_cycle()` continued returning bool and not see Terminate as a command. I'm also
> thinking that the command and _state is more or less the same thing, or am I missing something? If we want it could we
> instead have the command as a member that we set when we either start or undo/cancel the cycle?
>
Done. Also merged Command and ServiceState, and removed a few things.
> src/hotspot/share/gc/g1/g1Policy.cpp line 907:
>
>> 905: abort_time_to_mixed_tracking();
>> 906: }
>> 907: collector_state()->set_mark_or_rebuild_in_progress(start_concurrent_mark_cycle);
>
> As mentioned, I would prefer to check the `collector_state()` here rather than a passed in value.
>
> src/hotspot/share/gc/g1/g1ConcurrentMarkThread.cpp line 179:
>
>> 177: }
>> 178:
>> 179: concurrent_cycle_end(cmd == MarkCycle && !_cm->has_aborted());
>
> I think it would be more clear if this was called with `false` for the undo-case and with `!_cm->has_aborted()` for the
> normal case. But if you want to avoid two calls, I'm fine with this.
>
> -------------
>
> Changes requested by sjohanss (Reviewer).
>
> PR: https://git.openjdk.java.net/jdk/pull/177
>
Also moved the setters into .inline.hpp files as Albert suggested.
Updated PR. Note that the bots have gone missing in this thread. :(
Webrevs are broken as well - already filed an issue. :(
Testing: tier1-5 passed with new changes.
Thanks,
Thomas
More information about the hotspot-gc-dev
mailing list