RFR: 8349094: GenShen: Race between control and regulator threads may violate assertions [v12]
Y. Srinivas Ramakrishna
ysr at openjdk.org
Sat Feb 15 01:55:20 UTC 2025
On Fri, 14 Feb 2025 17:43:48 GMT, William Kemper <wkemper at openjdk.org> wrote:
>> There are several changes to the operation of Shenandoah's control threads here.
>> * The reason for cancellation is now recorded in `ShenandoahHeap::_cancelled_gc` as a `GCCause`, instead of various member variables in the control thread.
>> * The cancellation handling is driven entirely by the cancellation cause
>> * The graceful shutdown, alloc failure, humongous alloc failure and preemption requested flags are all removed
>> * The shutdown sequence is simpler
>> * The generational control thread uses a lock to coordinate updates to the requested cause and generation
>> * APIs have been simplified to avoid converting between the generation `type` and the actual generation instance
>> * The old heuristic, rather than the control thread itself, is now responsible for resuming old generation cycles
>> * The control thread doesn't loop on its own (unless the pacer is enabled).
>>
>> ## Testing
>> * jtreg hotspot_gc_shenandoah
>> * dacapo, extremem, diluvian, specjbb2015, specjvm2018, heapothesys
>
> William Kemper has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 30 commits:
>
> - Merge remote-tracking branch 'jdk/master' into fix-control-regulator-threads
> - Old gen bootstrap cycle must make it to init mark
> - Merge remote-tracking branch 'jdk/master' into fix-control-regulator-threads
> - Improve message for assertion
> - Make shutdown safer for threads requesting (or expecting) gc
> - Do not accept requests if control thread is terminating
> - Notify waiters when control thread terminates
> - Add event for control thread state changes
> - Fix shutdown livelock error
> - Fix includes
> - ... and 20 more: https://git.openjdk.org/jdk/compare/ba6c9659...915ffbda
Flushing the comments at EOD; will complete review later.
src/hotspot/share/gc/shenandoah/heuristics/shenandoahOldHeuristics.hpp line 188:
> 186:
> 187: bool should_start_gc() override;
> 188: bool resume_old_cycle();
Documentation comment please, especially explaining the return value.
For things that may return `false` and not do anything, it's better to use `try_` prefix. In fact, the method doesn't actually resume the cycle, but checks if we are in a state such that we should resume it. So, I'd name it `should_resume_old_cycle()`, consistent with the name `should_start_gc()` for the previous method.
src/hotspot/share/gc/shenandoah/shenandoahCollectorPolicy.hpp line 101:
> 99: || cause == GCCause::_shenandoah_allocation_failure_evac
> 100: || cause == GCCause::_shenandoah_humongous_allocation_failure;
> 101: }
Would it make sense to move this implementation also to the .cpp file like the other static `is_...` methods below?
src/hotspot/share/gc/shenandoah/shenandoahController.hpp line 42:
> 40: volatile size_t _allocs_seen;
> 41: shenandoah_padding(1);
> 42: volatile size_t _gc_id;
// A monotonically increasing GC count.
src/hotspot/share/gc/shenandoah/shenandoahController.hpp line 66:
> 64:
> 65: // This cancels the collection cycle and has an option to block
> 66: // until another cycle runs and clears the alloc failure gc flag.
But "the alloc failure gc flag" is gone above. The comment should be updated as well. A public API's description should avoid talking about its internal implementation details here. It's OK to talk about implementation details in the implementation of the method, not in the header spec here.
src/hotspot/share/gc/shenandoah/shenandoahController.hpp line 87:
> 85: // Returns the internal gc count used by the control thread. Probably
> 86: // doesn't need to be exposed.
> 87: size_t get_gc_id();
As far as I can tell, there's a single non-internal (public) use of this, and it's from `ShenandoahOldGeneration::handle_failed_promotion()` where it's being used for reducing logging data. If we do need to expose this through a public API, I'd elide the "Probably doesn't need to be exposed", and update the comment to:
// Return the value of a monotonic increasing GC count, maintained by the control thread.
src/hotspot/share/gc/shenandoah/shenandoahGenerationalControlThread.cpp line 64:
> 62: void ShenandoahGenerationalControlThread::run_service() {
> 63:
> 64: const int64_t wait_ms = ShenandoahPacing ? ShenandoahControlIntervalMin : 0;
So we are supporting ShenandoahPacing with GenShenm(at least till we pull it in the future), but don't want to enable it by default, correct?
src/hotspot/share/gc/shenandoah/shenandoahGenerationalControlThread.hpp line 64:
> 62: private:
> 63: // This lock is used to coordinate setting the _requested_gc_cause, _requested generation
> 64: // and _gc_mode. It is important that these be changed together and have a consistent view.
In that case, for ease of maintenance, I'd move the declaration of all of the 3 data members that this lock protects next to this lock, either immediately preceding or immediately succeeding its declaration in the body of this class.
Are these data members always both read and written under this lock? If so, then `_gc_mode` below doesn't need to be defined `volatile`.
src/hotspot/share/gc/shenandoah/shenandoahGenerationalControlThread.hpp line 88:
> 86: uint _age_period;
> 87:
> 88: // The mode is read frequently by requesting threads and only ever written by the control thread.
Do requesting threads lock the mutex when reading? I am trying to square yr comment that it's protected by the mutex with the field being declared `volatile`.
src/hotspot/share/gc/shenandoah/shenandoahHeap.hpp line 450:
> 448:
> 449: void cancel_concurrent_mark();
> 450: bool cancel_gc(GCCause::Cause cause);
// Returns true if and only if cancellation request was successfully communicated.
-------------
PR Review: https://git.openjdk.org/jdk/pull/23475#pullrequestreview-2618968208
PR Review Comment: https://git.openjdk.org/jdk/pull/23475#discussion_r1956962731
PR Review Comment: https://git.openjdk.org/jdk/pull/23475#discussion_r1956965579
PR Review Comment: https://git.openjdk.org/jdk/pull/23475#discussion_r1956944585
PR Review Comment: https://git.openjdk.org/jdk/pull/23475#discussion_r1956918529
PR Review Comment: https://git.openjdk.org/jdk/pull/23475#discussion_r1956929734
PR Review Comment: https://git.openjdk.org/jdk/pull/23475#discussion_r1956981955
PR Review Comment: https://git.openjdk.org/jdk/pull/23475#discussion_r1956816268
PR Review Comment: https://git.openjdk.org/jdk/pull/23475#discussion_r1956824150
PR Review Comment: https://git.openjdk.org/jdk/pull/23475#discussion_r1956952381
More information about the shenandoah-dev
mailing list