RFR: 8350621: Code cache stops scheduling GC [v3]

Thu Jul 17 14:52:51 UTC 2025

On Wed, 16 Jul 2025 08:28:24 GMT, Thomas Schatzl <tschatzl at openjdk.org> wrote:

>> Hi all,
>> 
>>   please review this change to avoid CodeCache triggered GCs temporarily being ignored.
>> 
>> In particular, G1 does not make sure when its `collect()` method is called during a concurrent cycle, that a `Remark` pause that does code unloading etc. actually occurs after that request. This makes it so that some internal flag is not reset appropriately (via some callback to the caller). After this event, there will never be another attempt by the compiler threads to issue a garbage collection. The compiler threads are stuck until the next code unloading (caused by e.g. a regular concurrent cycle being triggered).
>> 
>> The [original PR](https://github.com/openjdk/jdk/pull/23656) also stated a similar with Parallel GC - however due to recent changes (https://bugs.openjdk.org/browse/JDK-8192647 and others) I do not think it is possible any more that the `collect()`call can silently be ignored. (I.e. gclocker could abort a full gc that is the only reason why code cache unloading and that callback will not occur as expected).
>> 
>> So for G1, the change makes the caller of `collect()` (i.e. the compiler thread) wait until that situation passes and retry. With that the compiler thread is sure that the callback the compiler threads are waiting for occurs. This does have the disadvantage that that compiler thread may not be available for compilation any more for a bit.
>> 
>> Testing: tier1-5, test case passing, failing before
>> 
>> Thanks,
>>   Thomas
>
> Thomas Schatzl has updated the pull request incrementally with four additional commits since the last revision:
> 
>  - * more refactoring, comments
>  - * kbarrett review part 2 (slightly incomplete)
>  - * kbarrett review - 1
>  - * initial kbarrett refactoring

A few minor nits, but generally looks good.

I don't have any (possibly) better name suggestions for the new helper functions
used by `try_collect_concurrently`.

src/hotspot/share/gc/g1/g1CollectedHeap.cpp line 1733:

> 1731:     // If VMOp failed because a cycle was already in progress, it
> 1732:     // is now complete.  But it didn't finish this user-requested
> 1733:     // GC, so try again.

This assumes this is called after calling wait_full_mark_finished for the same op.  Might be
saying that.  Perhaps that should be a comment about the function as a whole.

src/hotspot/share/gc/g1/g1CollectedHeap.cpp line 1846:

> 1844:         continue;
> 1845:       }
> 1846:      } else if (!GCCause::is_user_requested_gc(cause)) {

I think this line is indented one space too many.

src/hotspot/share/gc/g1/g1CollectedHeap.cpp line 1848:

> 1846:      } else if (!GCCause::is_user_requested_gc(cause)) {
> 1847:       // For an "automatic" (not user-requested) collection, we just need to
> 1848:       // ensure that progress is made.

Maybe assert that the cause is one of the ones we expect here, rather than just defaulting to the
non-user-requested residue? That way, if a new cause gets added one is forced to look here with
an eye toward what should be done for the new cause.

src/hotspot/share/gc/g1/g1CollectedHeap.hpp line 284:

> 282:                                uint old_marking_started_after,
> 283:                                uint old_marking_completed_after);
> 284:   // Attempt to start a concurrent cycle with the indicated cause.

Add a blank line separating the new decl from the follower.

-------------

PR Review: https://git.openjdk.org/jdk/pull/26189#pullrequestreview-3029849241
PR Review Comment: https://git.openjdk.org/jdk/pull/26189#discussion_r2213549513
PR Review Comment: https://git.openjdk.org/jdk/pull/26189#discussion_r2213553325
PR Review Comment: https://git.openjdk.org/jdk/pull/26189#discussion_r2213559175
PR Review Comment: https://git.openjdk.org/jdk/pull/26189#discussion_r2213567475