RFR: 8316632: Shenandoah: Raise OOME when gc threshold is exceeded

Y. Srinivas Ramakrishna ysr at openjdk.org
Thu Sep 21 21:51:12 UTC 2023


On Wed, 20 Sep 2023 22:41:52 GMT, William Kemper <wkemper at openjdk.org> wrote:

> Shenandoah will run back-to-back full GCs and _almost_ grind mutator progress to a halt before eventually exhausting memory. This change will have Shenandoah raise a gc threshold exceeded exception if the collector fails to make progress after `ShenandoahNoProgressThreshold` full GC cycles (default is 3).

I have a few questions; thanks for clearing up my confusion! :-)

src/hotspot/share/gc/shenandoah/shenandoahControlThread.hpp line 124:

> 122:   ~ShenandoahControlThread();
> 123: 
> 124:   // Handle allocation failure from normal allocation.

Re "normal allocation" : what is not a normal (i.e. abnormal?) allocation?

src/hotspot/share/gc/shenandoah/shenandoahControlThread.hpp line 125:

> 123: 
> 124:   // Handle allocation failure from normal allocation.
> 125:   // Optionally blocks while collector is handling the failure.

Who is allowed to call the non-blocking version? What is the semantics of block or don't block? Can the documentation be extended to elaborate on its use?

src/hotspot/share/gc/shenandoah/shenandoahHeap.cpp line 947:

> 945:     control_thread()->handle_alloc_failure(req, false);
> 946:     notify_gc_progress();
> 947:     *gc_overhead_limit_was_exceeded = true;

I am confused. We have noted that we did more than `ShenandoahNoProgressThreshold` collections at this point. So we will fail the allocation request, set `gc_overhead_limit_was_exceeded`, and return `null` to the requester.

What happens at the caller, in response to this set of conditions?

Why then does it make sense to `notify_gc_progress()`, which clears the counter of `_gc_no_progress_count`? Where was the ostensible progress in this case?

src/hotspot/share/gc/shenandoah/shenandoahHeap.hpp line 292:

> 290:   ShenandoahSharedFlag   _concurrent_strong_root_in_progress;
> 291: 
> 292:   size_t _no_gc_progress_count;

Can we call this `_gc_no_progress_count` instead, so it's consistent with the corresponding `get_` method? The difference in naming is unnecessary at best and potentially confusing for maintenance in the long run, at worst.

-------------

PR Review: https://git.openjdk.org/jdk/pull/15852#pullrequestreview-1638713120
PR Review Comment: https://git.openjdk.org/jdk/pull/15852#discussion_r1333628772
PR Review Comment: https://git.openjdk.org/jdk/pull/15852#discussion_r1333629455
PR Review Comment: https://git.openjdk.org/jdk/pull/15852#discussion_r1333633859
PR Review Comment: https://git.openjdk.org/jdk/pull/15852#discussion_r1333608047


More information about the hotspot-gc-dev mailing list